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(57) Abstract 

The invention provides a method of synthesizing a repertoire of oligonucleotide tags comprising the steps of synthesizing a 
repertoire of oligonucleotide tag complements on one or more solid phase supports, cleaving a fraction of the oligonucleotide tag 
complements from the support(s), and inserting the cleaved tag complements into a cloning vector, as depicted in the figure. The 
cloned tag complements can conventionally be conjugated to selected polynucleotides to produce tagged polynucleotides having 
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(57) Abrege 

L'invention concerne une technique de synthese d'un repertoire de marqueurs oligonucleotidiques comprenant una etape de 
synthese d'un repertoire de complements de marqueurs oligonucleotidiques sur un ou plusieurs supports de phase solide, une etape 
consistant a separer du(des) support(s), par clivage, une fraction des complements de marqueurs oligonucleotidiques et une etape 
d'introduction des complements de marqueurs separees par clivage dans un vecteur de clonage, tel que represents par la figure. 
Les complements de marqueurs clones peuvent etre conjugues a des polynucleotides selectionnes afin de produire des 
polynucleotides marques ayant des sequences uniques de marqueurs qui peuvent etre plegees et triees par hybridation avec les 
complements de marqueurs correspondants. L'invention concerne ausst differents reactifs et composants utilises ou produits par 
cette technique. 
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(57) Abstract 

The invention provides a method of syndic- 
sizing a repertoire of oligonucleotide tags com- 
prising the steps of synthesizing a repertoire of 
oligonucleotide tag complements on one or more 
solid phase supports, cleaving a fraction of the 
oligonucleotide tag complements from the sup- 
port(s), and inserting the cleaved tag complements 
into a cloning vector, as depicted in the figure. 
The cloned tag complements can conventionally 
be conjugated to selected polynucleotides to pro- 
duce tagged polynucleotides having unique tag 
sequences which can be captured and sorted by 
hybridization to corresponding tag complements. 
Also provided are various reagents and compo- 
nents that are used or produced in the method. 
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Method for Maldng Complementary Oligonucleotide Tag Sets 
Field of the Invention 

The present invention relates generally to methods for synthesizing collections of oligonucleotide 
5 tags which may be used for identiiying, sorting, and/or tracking molecules, especially polynucleotides. 
In a particular aspect, the present invention relates to a method for forming complementary sets of 
oligonucleotide tags and lag complements, which are particularly useful in immobili^ng tagged entities 
on addressable arrays of tag complements. 

15 

10 Background of the Invention 

Specific hybridization of oligonucleotides and their analogs is a fundamental process that is 
employed in a wide variety of research, medical, and industrial applications, including the identification 
of disease-related polynucleotides in diagnostic assays, screening for clones of novel target 
polynucleotides, identification of specific polynucleotides in blots of mixtures of polynucleotides, 

1 5 amplification of specific target polynucleotides, therapeutic blocking of inappropriately expressed 

genes, DNA sequencing, and the like, e.g. Sambrook et aU Molecular Cloning: A Laboratory Manual, 
2nd Edition (Cold Spring Harbor Laboratory, New York, 1989); Keller and Manak, DNA Probes, 2nd 
Edition (Stockton Press, New York, 1993); Milligan et aL, J. Med. Chem., 36: 1923-1 937 (1993); 
Drmanac etal., Science, 260: 1649-1652 (1993); Bains, J., DNA Sequencing and Mapping, 4: 143-150 

20 (1993). 

Specific hybrid i7ation has also been proposed as a method of tracking, retrieving, and identifying 
compounds labeled with oligonucleotide tags, e.g. Brenner, International App. No. PCT/US95/12791 
(Pub. No. WO 96/12014); Church et al., Science, 240: 185-1S8 (1988); Brenner and Lerner, Proc. Natl. 
Acad. Sci., 89: 5381-5383 (1992); Alper, Science, 264: 1399-1401 (1994); and Needels etal., Proc. 

35 

25 Natl. Acad. Sci., 90: 10700-10704 ( 1 993). The successful implementation of such tagging schemes 
depends in large part on the success in achieving specific hybridization between a tag and its 
complementary probe. That is, for an oligonucleotide tag to successfully identify a substance, the 
number of false positive and false negative signals must be minimized. Unfortunately, such spurious 

40 

signals are not uncommon because base pairing and base stacking free energies vary widely among 
30 nucleotides in a duplex or triplex structure. For example, a duplex consisting of a repeated sequence of 
deoxyadenosine (A) and thymidine (T) bound to its complement may have less stability than an equal- 
length duplex consisting of a repeated sequence of deoxyguanosine (G) and deoxycytidine (C) bound to 
a partially complementary target containing a mismatch. Thus, if a desired compound from a large 
combinatorial chemical library were tagged with the former oligonucleotide, a significant possibility 
3 5 would exist that, under hybridization conditions designed to detect perfectly matched AT-rich duplexes, 
undesired compounds labeled with the GC-rich oligonucleotide, even in a mismatched duplex, would 
be detected along with the perfectly matched duplexes consisting of the AT-rich tag. Even though 
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reagents, such as tetramethylammonium chloride, are available to negate base- specific stability 
differences of oligonucleotide duplexes, the effect of such reagents is often limited, and their presence 
can be incompatible with, or render more difficult, further manipulations of the selected compounds, 
e.g. amplification by polymerase chain reaction (PGR), or the like. Such problems have been addressed 
by Brenner, as described in WO 96/12014 (supra) and International App. No. PCT/US96/09513 (WO 
96/4101 1), in the development of minimally crosii-hybridizing oligonucleotide tag sets which minimize 
the occurrence of mismatch hybridization. 

Unfortunately, however, current methods of solid phase oligonucleotide synthesis, although highly 
efficient, still result in a significant fraction of failure sequences, particularly when oligonucleotides 
exceed 30 to 40 nucleotides in length. When mixtures of such oligonucleotides are employed as tags, 
particularly for sorting, an important consequence of failure sequences is that a fraction of the tags will 
not have complements whenever the two compounds are synthesized separately. Thus, in the sorting of 
complex mixtures, e.g. as called for in the process described in Brenner (cited above), such failures will 
limit the completeness of the sorting process. 

In view of the above, it would be useful if there were available a means for synthesizing sets of 
oligonucleotide tags that would ensure that every tag had a complement regardless of the efficiency of 
the synthetic process employed. 

Summary of the Invention 

An object of the present invention is to provide a method of synthesizing one or more repertoires 
of oligonucleotide tags that ensures that every oligonucleotide tag in a repertoire will have a 
corresponding tag complement. 

Another object of the invention is to provide a system for sorting polynucleotides onto solid phase 
supports by way of oligonucleotide tags produced by the method of the invention. 

A further object of the invention is to provide a method of synthesizing oligonucleotide tags by the . 
combinatorial addition of subunits (words). 

These and other objects of the invention are achieved by providing a method of synthesis 
comprising the steps of synthesizing a repertoire of oligonucleotide tag complements on one or more 
solid phase supports, cleaving a fi-action of the oligonucleotide tag complements from the suppcrt(s), 
and inserting the cleaved tag complements into a cloning vector. The cloned tag complements can 
conveniently be conjugated to selected substances, preferably biomolecules such as polynucleotides or 
polypeptides, to produce tagged substances having unique tag sequences which can be captured and 
sorted by hybridization to corresponding tag complements. 

In one embodiment, the invention provides a method of synthesizing a repertoire of oligonucleotide 
tags. In the method, a plurality of different sequence oligonucleotide populations are synthesized on one or 
more solid phase supports, such that each population is located at a spatially discrete region relative to the 
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other populations, and the oligonucleotides in each population comprise a tag-compJement sequence that is 
the same for every oligonucleotide in a given population, but which may be dififerent from the tag- 
complement sequence in every other oligonucleotide population. A fraction of the oligonucleotides are 
cleaved from each population on the one or more supports, to release a mixture of different-sequence tag- 
complement oligonucleotides. A primer is annealed to each tag-complcmait oligonucleotide, and the 
annealed primers are extended to form a duplex comprising a tag-complement strand and a complementary 
tag strand, such that the tag-complement strands or the duplexes thus fonncd comprise on oligonucleotide 
tag repertoire. In one preferred embodiment, the oligonucleotides are bound to the one or more supports by 
their 3'-tcrmini, and each oligonucleotide contains a imiversal primer-binding sequence on the 3 -side of the 
tag complement sequence. In another preferred embodiment, the fraction of oligonucleotides cleaved from 
each population is from 10 to 309^. According to yet another preferred embodiment, a selected fraction of 
each population of synthesized oligonucleotides is bound to the one or more supports via base-cleavable 
linkages. 

The invention also includes a repertoire of tag oligonucleotides prepared as above. 

Also provided is a method of fonning a tag-vector library comprising multiple members from a tag 
repertoire. In the method, an oligonucleotide tag repertoire prepared as above is inserted by ligation into 
multiple copies of a selected vector, to form a tag-vector library comprising members of the tag repertoire. 
The invention also includes a tag-vector library formed by the method. 

The invention also includes a method of fonning a library of tagged polynucleotide fragments. In the 
method, a tag-vector library from above is combined with a mature of different polynucleotide fragments 
under ligation conditions effective to insert a polynucleotide fragment into each tag-vector at a site adjacent 
to a tag-sequence in each vector. The invention also includes a library of tagged polynucleotide fragments 
formed by the method. 

In yet another aspect, the invention also includes a system for sorting polynucleotides. The system 
includes one or more solid phase supports on which are attached a plurality of oligonucleotide populations, 
such that each population is located at a spatially discrete region on the one or more supports relative to the 
other populations, and the oligonucleotides in each population comprise a tag-complement sequence that is 
the same for every oligonucleotide in a given population. The system also includes a tag composition 
selected from the groiq) consisting of a tag repertoire, a tag-vector library, or a library of tagged 
polynucleotide fragments, which may be prepared as above. 

The invention also provides a solid phase support comprising a plurality of immobilized 
oligonucleotides containing identical tag sequences, wherein at least one of the oligonucleotides is 
linked to the support by a cleavable linker, and at least one other of the oligonucleotides is linked to the 
support by a non-cleavable linker. In a first preferred embodiment, the support is a bead. In a second 
preferred embodiment, the support comprises a planar array of addressable regions, each region 
containing a pluralit>' of immobilized oligonucleotides containing identical tag sequences, such that the 
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tag sequences in at least two of the regions are different from each other. Preferably, the cleavable 
linker is a base-labile linker. The solid phase supports of the invention are particularly useful for 
generating repertoires of tag oligonucleotides that are substantially complementary to the 
oligonucleotides immobilized on the supports. 

Thus, the invention also includes a solid phase bead or particle comprising a population of 
oligonucleotides attached to the bead or particle, wherein the population comprises first and second classes 
of oligonucleotides which contain identical oligonucleotide sequences, and the oligonucleotides in the first 
class contain a cleavable linking moiety that permits selective cleavage of the first class of oligonucleotides 
from tbc support, without significantly cleaving the second class of oligonucleotides. 

In another embodiment, the invention includes a mixture of solid phase beads or particles of the type 
described above, wherein at least two beads or particles in the mixture contain oligonucleotide populations 
with different tag-complement sequences. 

These and other objects and features of the present invention will become more apparent from the 
following description and the appended drawings. 

Brief Description of the Drawings 

Figure 1 shows an exemplary pair of oligonucleotide linking structures bound to a solid phase 
bead, and a mechanism of cleaving one of the linkers to release an attached oligonucleotide containing 
a tag complement (TC I), while the other linker remams intact 

Figure 2 shows a general scheme for practicing one embodiment of the invention, wherein a 
portion of a repertoire of immobilized oligonucleotides containing different-sequence tag complements 
are cleaved from the solid phase support, converted to double-stranded form, and inserted into a cloning 
vector. A tag-vector (i) library is combined with a mixture of different polynucleotide fragments under 
ligation conditions effective to insert the different fragments at a site adjacent to a tag-sequence in each 
vector, so that each sample fragment (SJ becomes associated with a different tag (To). The tag-sample 
fragments may then be excised and/or amplified, and the tagged fragments sorted by hybridization to 
immobilized corresponding tag complements. 

Detailed Description of the Invention 

The present invention provides methods and reagents for synthesizing matched sets (also referred 
to as repertoires) of tags and tag complements having high complementarity with each other. Such 
matched sets are particularly useful for labeling and sorting tagged molecules for parallel operations 
such as sequencing, fingerprinting, or other types of analysis. 

In one aspect of the invention, populations of oligonucleotides are synthesized wherein each 
population contains identical tag complements that are linked to a support by a mixture of cleavable and 
non-cleavable linking moieties. The cleavable linker permits a portion of the ohgonucleotides in each 
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population to be cleaved from the support to serve a<; templates for forming oligonucleotides whose 
sequences are perfectly complementary to the tag complements remaining on the support. By cleaving 
a portion of the tag complements from the one or more supports, and using the cleaved tag 
complements as templates to form complementary tags, the method produces matched tag and tag 
complement repertoires in greater yield and fidelity than previous methods. The invention therefore 
enhances hybridization specificity and tag sequence complexity relative to what is achieved when tags 
and their complements are synthesized independently. 

The invention provides a substantial improvement in techniques that utilize oligonucleotide tags, 
by providing a method for enhancing the fidelity of synthesis of tag complements. Previously, tags 
were synthesized independently from tag complements, such that there was no assurance that all 
intended tags and tag complements were actually synthesized. In particular, previous polynucleotide 
s>'nthesis methods were susceptible to significant or complete failures in one or more nucleotide 
addition steps, such that resultant tags or tag complements did not have the intended sequences and 
cannot hybridize perfectly with their intended partners. Defective synthesis of oKgonucleotides can 
prevent tagged sample species from locating and hybridizing to the intended tag complements, resulting 
in loss of signal and/or underestimation of the abundance of the affected sample species. In addition, 
the presence of defective tag sequences could lead to cross-hybridization with other tags, interfering 
with hybridization of those tags to their corresponding complements. Moreover, the complexity of the 
tag repertoires and lag complement repertoires was diminished. 

These problems are overcome by the present invention, by using a selected portion of immobilized 
tag complement sequences as templates to form a repertoire of coiiesponding tags. Ligation of the tags 
to sample selected sample substances, preferably biomolecules such as polynucleotides or polypeptides, 
yields a mixture of tagged substances (e.g., sample polynucleotide fragments) which can be captured on 
an array of tag complements. The invention thus provides a way of ensuring that a tag repertoire, or a 
subset of a repertoire, can hybridize to a full range of tag complements. 

I. DefinidoDs 

*'Tag" or **tag oligonucleotide" or "oligonucleotide tag" refers to an oligonucleotide that contains a 
nucleotide sequence capable of serving as an identifier label for an attached species, such as a sample 
oligonucleotide, and which is detectably distinguishable from other tags in a tag repertoire. 

"Complement" or "tag complement" as used herein refers to an oligonucleotide to which an 
oligonucleotide tag specifically hybridizes to form a perfectly matched duplex or triplex. In 
embodiments where specific hybridization results in a triplex, the oligonucleotide tag may be selected 
TO be either double-stranded or single-stranded. Thus, where triplexes are formed, the term 
"complement" is meant to encompass eitlier a double stranded complement of a single stranded 
oligonucleotide tag or a single-stranded complement of a double-stranded oligonucleotide tag. 
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The term '^oligonucleotide'* as used herein means a linear oligomer of Datural or modified 
nucleosidic monomers, including deoxyribonucleosides and ribonucleusides. Usually monomers are 
linked by phosphodiester bonds or analogs thereof (e.g., phosphorothioates, phosphoramidatcs, 
phosphonates, peptide nucleic acids (N-dcrivatized glycine polymers), and the like) to form 
oligonucleotides ranging in size from a few monomeric units, e.g. 3-4, to several tens of monomeric 
units, and up to ] 00 monomeric units. Whenever an oligonucleotide is represented by a sequence of 
letters, such as "ATGCCTG," it will be understood that the nucleotides are in 5'->3' order from left to 
right and that "A" denotes deoxyadenosine, "C" denotes deoxycytidine, **G" denotes deoxyguanosine, 
and "T" denotes thymidine, unJess otherwise noted. Usually oligonucleotides of the invention comprise 
the four natural nucleotides. However, they may also comprise non- natural nucleotide analogs. It is 
clear to those skilled in the art when oligonucleotides having natural or non-natural nucleotides may be 
employed, e.g. where processing by enzymes is called for, usually oligonucleotides consisting of 
natural nucleotides are required. 

"Polynucleotide" or **polynucleotide fragment** refers to a polymer containing two or more 
nucleotides. *Tol>Tiucleotide'* encompasses "oligonucleotide*', although the length of a polynucleotide 
can exceed 100 monomeric units. 

"Perfectly matched" in reference to a duplex means that the poly- or oligonucleotide strands 
making up the duplex form a double stranded structure with one other such that every nucleotide in 
each strand undergoes Watson-Crick basepairing with a nucleotide in the other strand. The term also 
encompasses the pairing of nucleoside analogs, such as deoxyinosine, nucleosides with 2-aminopurine 
bases, and the like, that may be employed. In reference to a triplex, the term means that the triplex 
consists of a perfectly matched duplex and a third strand in which every nucleotide undergoes 
Hoogsteen or reverse Hoogsteen association with a basepair of the perfectly matched duplex. 
Conversely, a "mismatch" in a duplex between a tag and an oligonucleotide means that a pair or triplet 
of nucleotides in the duplex or triplex fails to undergo Watson-Crick and/or Hoogsteen and/or reverse 
Hoogsteen bonding. 

As used herem, "nucleoside" includes the natural nucleosides, including 2'-deoxy and 2'-hydroxyl 
forms, as described in Komberg and Baker, DNA Replication, 2nd Ed. (Freeman, San Francisco, 1992). 
"Analogs" in reference to nucleosides includes synthetic nucleosides having modified base moieties 
and/or modified sugar moieties, e.g. described by Scheit, Nucleotide Analogs (John Wiley, New York, 
1980); Uhlman and Peyman, Chemical Reviews, 90: 543-584 (1990), or the like, with the only proviso 
that they are capable of specific hybridization. Such analogs include synthetic nucleosides designed to 
enhance binding properties, reduce complexity, increase specificity, and the like. 

''Nucleotide" refers to a nucleoside or nucleoside analog having one or more phosphate groups (or 
analog thereof) attached to a 5', 3*, and/or 2 '-hydroxy!, or a nucleoside contained in an oligonucleotide 
or polynucleotide. 
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As used herein, "complexity" in reference to a population of polynucleotides means the number of 
different species of molecule present in the population. 

As used herein, "repertoire" refers to a collection or mixture of difference- sequence 
oligonucleotides, e.g., comprising oligonucleotide tags or tag complements, A "repertoire" can have a 
5 defined complexity. 

As used herein, "clcavablc linking moiety^ refers to a chemical group that is cleavable under 
selected chemical conditions. The term "cleavable linker" refers to a compoimd or chemical group that 
contains a cleavable linking moiety. For example, the term "cleavable linker" encompasses an 
activated form of a linker compound that can be coupled via a first linking functionality to a support, 
10 and via a second functionality to an oligonucleotide. The cleavable linker can contain a cleavable 
linking moiety, such as an ester group, which is cleavable under selected cleavage conditions. 



II. Components 

The following sub-sections discuss various components, e.g., materials and reagents, that may be 
1 5 used to practice various aspects of the present invention. 
25 A. Solid Support 

Solid phase supports for use with the invention can have any of a wide variety of forms, including 
microparticles, beads, membranes, slides, plates, micro-machined chips, and the like. Likewise, solid 
phase supports of the invention may comprise a wide variety of materials, including glass, plastic, 
30 20 silicon, alkanethiolate-derivatizcd gold, cellulose, low cross- linked and high cross-linked polystyrene, 

silica gel, polyamidc, and the like. Preferably, either a population of discrete particles is employed, 
such that each particle has a uniform coating, or population, of complementary sequences for the same 
tag (and no other), or one or more supports are employed with spatially discrete regions, each region 
35 containing a uniform coating, or population, of sequences that are complementary to the same tag (and 

25 no other). In the latter embodiment, the area of the regions may vary according to particular 

applications. Usually, the regions range in area from several ^m^, e.g. 3-5 jim^, to several hundred 
^im^, e.g. 100-500 \im^. Preferably, such regions are spatially discrete so that signals generated by 
events, e.g. fluorescent emissions, at adjacent regions can be resolved from each other by the detection 
system being employed. Tn .some applications, it may be desirable to have regions with uniform 
30 coatings of more than one tag complement, e.g. for simultaneous sequence analysis, or for bringing 
separately tagged molecules into close proximity. 

For example, a wide variety of microparticle supports may be used with the invention, including 
microparticles made of controlled pore glass (CPG), highly cross- linked polystyrene, acrylic 
copolymcrSj cellulose, nylon, dextran. latex, polyacrolein, and the like, disclosed in the folbwing 
35 exemplary references: Meth. Enzymol., Section A, Vol. 44, pages 1 1-147, (Academic Press, New 
York, 1976); U.S. patents 4,678,814; 4,413,070; and 4,046;720; and Pon, Chapter 19, in Agrawal, 
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editor. Methods in Molecular Biology, Vol. 20, (Humana Press, Totowa, NJ, !993). Micropanicle 
supports further include commercially available nucleoside-derivatized CPG and polystyrene beads 
(e.g. available from Perkin-Elmer Applied Biosystems, Foster City, CA); derivatized magnetic beads; 
polystyrene grafted with polyethylene glycol (e.g., TentaGel™, Rapp Polymerc, Tubingen, Germany); 
5 and the like. Selection of the support characteristics, such as material, porosity, size, shape, and the 
like, and the type of linking moiety employed will depend on the conditions under which the tags are 
used. For example, in applications involving successive processing with enzymes, supports and linkers 
that minimize steric hindrance of the enzymes and that facilitate access to substrate arc preferred. Other 
important factors to be considered in selecting the most appropriate microparticle support include size 
1 0 unifonnit>', efficiency as a synthesis support, surface area, and optical properties. It is noted that 

smooth, clear (transparent) beads provide instrumentational advantages when handling large numbers 
2Q of beads on a surface. 

As mentioned above, tag complements moy also be synthesized on one or more solid phase 
supports to form an array of regions uniformly coated with tag complements. That is, within each 
1 5 region in such an array the same tag complement is synthesis^d. Techniques for synthesizing such 
25 arrays are now well known and are disclosed, for example, in McGall et al., PCT App. No. 

PCTAJS93/03767 (WO 93/22680 (Affymax); Pease et al, Proc. Natl. Acad. Sci. 91:5022-5026 (1994); 
Southern and Maskos in PCT App. No. PCT/GB89/01 1 1 4 (WO 90/03382 by Isis); Southern et al.. 
Genomics, 13: 1008-1017 (1992); and Maskos and Southern, Nucleic Acids Research, 21:4663-4669 
30 20 (1993). 

Preferably, the invention is implemented with microparticles or beads uniformly coated with 
complements of the same tag sequence. Microparticle supports and methods of covalently or 
noncovatently linking oligonucleotides to their surfaces are well known, as exemplified by the 
35 following references: Beaucage and Iyer (cited above); Gait, editor, Oligonucleotide Synthesis: A 

25 Practical Approach (IRL Press, Oxford, 1984 and 1990 editions); and the references cited above. 

Generally, the size and shape of a microparticle is not critical; however, microparticles in the size range 
of a few, e.g. 1-2, to several hundred, e.g. 200-1 000 \xm diameter are preferable, as they facilitate the 
construction and manipulation of large repertoires of oligonucleotide tags with minimal reagent and 
sample usage. 

30 In some preferred applications, commercially available controlled-pore glass (CPG) or polystyrene 

supports are employed as solid phase supports in the invention. Such supports come available with 
base- labile linkers and initial nucleosides attached, e.g. Applied Biosystems (Foster City, CA). 
Preferably, microparticles having pore size between 500 and lOOO angstroms are employed. 
In other preferred applications, non-porous microparticles are employed for their optical 
35 properties, which may be advantageously used when tracking large numbers of microparticles on planar 
supports, such as a microscope slide. Particularly preferred non-porous microparticles are the glycidal 
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methacrylate (GMA) beads available from Bangs Laboratories (Cannel, IN). Such microparticles are 
useful in a variety of sizes and derivatized with a variety of linkage groups for synthesizing tags or tag 
complements. Preferably, for massively parallel manipulations of tagged microparticles, GMA beads 
having diameters of about 5 \im diameter are employed. 
B. Tag Sequences 

The tags of the invention are designed to provide unique tagging sequences for sample species of 
interest, and for specific hybridization to corresponding tag complements immobilized on one or more 
solid supports. The sequences of the tags (and tag complements) can be any group of sequences 
selected by the user, provided that (1) each different tag and tag complement does not significantly 
cross-hybridize with any other tag or tag complement in the repertoire, under selective hybridization 
conditions for sorting, (2) each tag and tag complement does not form secondary structure that would 
prevent specific hybridization with the intended complementary sequence, and (3) each tag can 
specifically recognizes and hybridize to its corresponding tag complement under the same conditions 
for all tags in the selected tag repertoire. Generally, the number of different tags increases in proportion 
to the number of different sample species that are to be processed in parallel. Tag sequences may be 
designed in accordance with any suitable method, such that the desired sequence specificity and 
sensitivity is achieved. 

In a preferred embodiment of the invention, the oligonucleotide tag sequences are selected from a 
"minimally cross-hybridizing set*' of oligonucleotide sequences. Preferably, the sequences of any two 
oligonucleotide tags of a tag repertoire differ by at least two nucleotides when the oligonucleotides are 
aligned to achieve maximum sequence identity. In other words, duplex or triplex formation between 
any tag in a repertoire and any other tag m the repertoire, or a tag complement thereof, results in at least 
two mismatches. In a more particular embodiment, the tag sequences in a tag repertoire are designed to 
ensure that duplexes and triplexes cannot form with any other tags or tag complements in the repertoire 
without forming at least three mismatched nucleotides, and so on. In such embodiments, greater 
specificity is achieved as the minimum number of mismatches increases, but the total repertoire of tags 
is smaller. Thus, for tags of a given length, there is a trade-off between the desired specificity and the 
size of repertoire. 

The nucleotide sequences for oligonucleotides of a minimally cross-hybridizing set (repertoire) 
can be conveniently enumerated by simple computer programs following tiie procedures described in 
PCT Publication WO 96/4101 1 (see especially Fig. 1, the programs provided in Appendices la and lb, 
and the comesponding discussion in the text) and PCT Publication WO 97/46704 (Appendices I and 11), 
which are incorporated herein by reference. Program minhx of Appendix la in WO 96/4101 1 computes 
all minimally cross-hybridizing sets having 4-mer subunits composed of three kinds of nucleotides. 
Program tagN of Appendix lb (WO 96/4101 1) enumerates longer oligonucleotides of a minimally 
cross-hybridizing set. Program 3tagN (WO 97/46704 at Appendix II) can be used to produce 
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minimaJly cross-hybridizing sets of double stranded tags for binding single-stranded tag complements. 
Similar algorithms and computer programs are readily written for listing oligonucleotides of minimally 
cross-hybridizing sets for any embodiment of the invention. Table I below provides guidance as to the 
size of sets of minimally cross-hybridizing oligonucleotides for the indicated lengths and number of 
nucleotide differences, wherein the above-mentioned computer programs were used to generate the 
numbers shown. 

Table I 



15 


Oligonucleotide 
Subunit 
Length 


Nucleotide 
Difference between 
Oligonucleotides of 
Minimally Cross- 
Hybridizing Set 


Maximal 
Size of 
Minimally 

Cross- 
Hybridizing 
Set 


Size of 
Repertoire 
with Four 

Subunits 


Size of 
Repertoire 
with Five 
Subunits 


20 


4 


3 


9 


6561 


5.90 X lO'^ 




6 


3 


27 


5.3 X 10^ 


1.43 x 10*^ 




7 


4 


27 


5.3x10^ 


1.43x10"^ 




7 


5 


8 


4096 


3.28 X lO"* 




8 


3 


190 


1.30 X 10^ 


2.48 X 10^' 


25 


8 


4 


62 


L48x 10^ 


9.16 X 10* 




8 


5 


18 


1.05 X 10^ 


1.89 X 10^ 




9 


5 


39 


2.31 X 10^ 


9.02x10*^ 




10 


5 


332 


1.21 X lo'** 




30 


10 


6 


28 


6.15 X 10^ 


1.72 X 10^ 




11 


5 


187 








18 


6 


=^25000 








18 


12 


24 
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For some embodiments of the invention, where extremely large repertoires of tags are not required, 
1 0 oligonucleotide tags of a minimally cross-hybridizing set may be separately synthesized. Sets 

containing several hundred to several thousands, or even several tens of tliousands, of oligonucleotides 
may be synthesized directly by a variety of parallel synthesis approaches, e.g. as disclosed in Frank ct 
al., U.S. patent 4,689,405; Frank et al., Nucleic Acids Research, 1 1 : 4365-4377 (1 983); Matson ct al., 
Anal. Biochcm., 224: 110-116 (1995); Fodor et al., International application PCT/US93/04I45 (WO 
15 93/22684); Pease ct al., Proc. Natl. Acad. Sci., 91 : 5022-5026 (1994); Southern et aL. J. Biotechnology, 
35: 217-227 (1994), Brennan, International application PCT/US94/05896 (WO 94/27719); Lashkari et 
al., Proc, Natl. Acad. Sci., 92: 7912-791 5 (1995); or the like. 

Preferably, oligonucleotide tags of the invention are synthesized combinatorially out of subunits 
(also referred to as "words") that are three to nine, and preferably three to six, nucleotides in length and 
20 are selected from the same minimally cross-hybridizing set. For oligonucleotides in this range, the 

members of such sets may be enumerated by computer programs based on the algorithm of Fig. 1 from 
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20 



25 



30 



WO 96/41011. 

Preferably, minimally cross-hybridizing sets comprise subunitsthat make approximately 
equivalent contributions to duplex (or triplex) stability as every other subunit in the set. In this way, the 
stability of perfectly matched duplexes between every subunit and its complement is approximately 
5 equal. Guidance for selecting such sets is provided by published techniques for selecting optimal PCR 
primers and calculating duplex stabilities, e.g. Rychlik et al.. Nucleic Acids Research, 17: 8543-8551 
(1989) and 18: 6409-6412 (1990); Breslauer et al., Proc. Natl. Acad. Sci., 83: 3746-3750 (1986); 
Wctmur, Crit. Rev. Biochem. Mol. Biol., 26: 227-259 (1991); and the like. For shorter tags, e.g. about 
30 nucleotides or less, the algorithm described by Rychlik and Wctmur is preferred, and for longer tags, 

1 0 e.g. about 30-35 nucleotides or greater, an algorithm disclosed by Suggs et al., pages 683-693 m 
Brown, editor, ICN-UCLA Symp. Dev. Biol., Vol. 23 (Academic Press, New York, 198 1) may be 
conveniently employed. Clearly, there are many approaches available to one skilled in the art for 
designing sets of minimally cross-hybridizing subunits within the scope of the invention. For example, 
to minimize the affects of different base-stacking energies of terminal nucleotides when subunits are 

1 5 assembled, subunits may be provided that have the same terminal nucleotides. In this way, when 

subunits are linked, the sum of the base- stacking energies of all the adjoining terminal nucleotides will 
be the same, thereby reducing or eliminating variability in tag melting temperatures. 

A "word" of terminal nucleotides, shown in italics with underlining below, may also be added to 
each end of a tag so that a perfect match is always formed between it and a similar terminal "word" on 

20 any other tag complement Such an augmented tag would have the form: 
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E 


Wi 


W2 


Wk-1 


Wk 


K 




wr 


W2* 


wk-r 


Wk' 


W' 



where the primed W*s indicate complements. With ends of tags always forming perfectly matched 

duplexes, all mismatched subunits (words) will cause internal mismatches, thereby reducing the 
25 stability of tag-complement duplexes that otherwise would have mismatched words at their ends. It is 

well known that duplexes with internal mismatches are significantly less stable than duplexes with the 

same mismatch at a terminus. 

A preferred embodiment of minimally cross-hybridizing sets are those whose words are made up 

of three of the four natural nucleotides. The absence of one type of nucleotide in the oligonucleotide 
30 tags permits the tags to be converted from double-to single-stranded form, if desired, using the 5'-^3' 

exonuclease activity of a DNA polymerase to remove the tag complement strand (stripping reaction). 

The following is an exemplary minimally cross-hybridizing set of words each comprising four 

nucleotides selected from the group consisting of A, G, and T: 
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Table 11 



Word: 




«2 


«3 


W4 


Sequence : 


GATT 


TGAT 


TAGA 


TTTG 


Word: 


W5 


^6 


v/7 


"8 


Sequence: 


GTAA 


ACTA 


ATGT 


AAAG 



In this set, each member would form a duplex having three mismatched bases with the complement of 
eveiy other member. 

Further exemplary minimally cross-hybridizing sets are listed below in Table lU. Clearly, 
additional sets can be generated by substituting diCTerent groups of nucleotides, or by using subsets of 
known minimally cross-hybridizing sets. 

Table III 

Exemplary Minimally Cross-Hvbridizing Sets of 4-mer Subunils (Words) 



Set 1 


Set 2 


Set 3 


Set 4 


Set 5 


Set 6 


CATT 


ACCC 


AAAC 


AAAG 


AACA 


AACG 


CTAA 


AGGG 


ACCA 


ACCA 


ACAC 


ACAA 


TCAT 


CACG 


AGGG 


AGGC 


AGGG 


AGGC 


ACTA 


CCGA 


CACG 


CACC 


CAAG 


CAAC 


TACA 


CGAC 


CCGC 


CCGG 


CCGC 


CCGG 


TTTC 


GAGC 


CGAA 


CGAA 


CGCA 


CGCA 


ATCT 


GCAG 


GAGA 


GAGA 


GAGA 


GAGA 


AAAC 


GGCA 


GCAG 


GCAC 


GCCG 


GCCC 




/U\AA 


GGCC 


GGCG 


GGAC 


GGAG 


Set 7 


Set 8 


Set 9 


Set 10 . 


Set 11 


Set 12 


AAGA 


AAGC 


AAGG 


ACAG 


ACCG 


ACGA 


ACAC 


ACAA 


ACAA 


AACA 


AAAA 


AAAC 


AGCG 


AGCG 


AGCC 


AGGC 


AGGC 


AGCG 


CAAG 


CAAG 


CAAC 


CAAC 


CACC 


CACA 


CCCA 


CCCC 


CCCG 


CCGA 


CCGA 


CCAG 


CGGC 


CGGA 


CGGA 


CGCG 


CGAG 


CGGC 


GACC 


GACA 


GACA 


GAGG 


GAGG 


GAGG 


GCGG 


GCGG 


GCGC 


GCCC 


GCAC 


GCCC 


GGAA 


GGAC 


GGAG 


GGAA 


GGCA 


GGAA 



C. Tag Synthesis 

Oligonucleotide tag complements for use in the invention arc conveniently synthesized on an 
automated DNA synthesizer, e.g. an Applied Biosystems model 392 or 394 DNA/RNA Synthesizer 
(Perkin-Elmer Applied Biosystems, Foster City, C A), or a "GENE ASSEMBLER PLUS" (Phamiacia), 
using standard chemistries, preferably phosphoramidite chemistry, e.g., as disclosed in the following 
references: Beaucage and Iyer, Tetrahedron, 48: 2223-231 1 (1992);MoIko et al., U.S. patent 
4,980,460; (Coster et al., U.S. patent 4,725,677; Caruthers et al., U.S. patenU 4,4 1 5,732; 4,458,066; and 
4,973,679; M.J. Gait (Ed.) in Oligonucleotide Synthesis, a Practical Approach, IRL Press. Oxford, 
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England (1990); and the like. 

When mtcroparticles are used as supports, repertoires of tag complements may be generated by 
word-wise synthesis via "split and mix" techniques, e.g. as disclosed in Shortle et al.. International 
patent application PCTAJS93/03418 (WO 93/21203) or Lyttle et al.. Biotechniques, 19: 274-280 
(1995). Briefly, (he basic unit of the synthesis is a subunit ("word**) of the tag sequence. Preferably, 
phosphoramidite chemistry is used and 3' phosphoraraidite oligonucleotides are prepared for each word 
in a minimally cross-hybridizing set, e.g. for the set first listed above, there would be eight 3'- 
phosphoramidite tetramers. Synthesis proceeds as disclosed by Shortle et al or in direct analogy with 
the techniques employed to generate diverse oligonucleotide libraries using nucleosidic monomers, e.g. 
as disclosed in Telenius et al.. Genomics, 13: 718-725 (1992); Welsh et al.. Nucleic Acids Research, 
19: 5275-5279 (1991); Grothues ct al.. Nucleic Acids Research, 21: 1321-1322 (1993); Hartley, 
Eurt)pean patent application no. 90304496.4; Lam et al.. Nature, 354: 82-84 (1991); Zuckerman et al.. 
Int. J. Pept. Protein Research, 40: 498-507 (1992); and the like. Generally, these techniques simply call 
for the application of mixtures of the activated monomers (or polymer words) to the growing 
oligonucleotide during the coupling steps. 

Preferably, tag complements are synthesized on a DNA synthesizer having a number of synthesis 
chambers which is greater than or equal to the number of different kinds of words used in the 
construction of the tags. Tliat is, preferably there is a synthesis chamber corresponding to each type of 
word. In this embodiment, words are added nucleotide-by-nucleoiide, such that if a word consists of 
five nucleotides there are five monomer couplings in each synthesis chamber. After a word is 
completely synthesized, the synthesis supports are removed from the chambers, mixed, and 
redistributed back to the chambers for the next cycle of word addition. This latter embodiment takes 
advantage of the high coupling yields of monomer addition, e.g. in phosphoramidite chemistries (e.g., 
see Example 1). 

The tag and tag-complement sequences of the invention preferably have a length range from 1 2 to 
60 nucleotides or basepairs. Preferably, the tag sequences range in length from 1 8 to 40 nucleotides or 
basepairs. More preferably, the tag sequences have lengths from 25 to 40 nucleotides or basepairs. In 
tenms of preferred and more preferred numbers of words, these ranges may be expressed as follows: 

Table IV 

Numbers of Subunits (Words) in Preferred Tag Embodiments 

Monomers 

in Subunit Nucleotides in Oligonucleotide Tag 



(12-60) 



(18-40) 



(25-40) 



3 
4 
5 
6 



4-20 subunits 
3- 1 5 subunits 
2-12 subunits 
2- 1 0 subunits 



6-13 subunits 
4-10 subunits 
3-8 subunits 
3-6 subunits 



8-13 subunits 
6-10 subunits 
5-8 subunits 
4-6 subunits 
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Most preferably, oligonucleotide tags are single- stranded, and specific hybridization occurs via Watson- 
Crick pairing with a tag complement. 

Preferably, repertoires of tags and/or tag complements of the invention contain at least 100 
members; more preferably at least 1 000 members; and most preferably at least 1 0,000 members. 

In embodiments where specific hybridization occurs via triplex formation, coding of tag sequences 
follows the same principles as for duplex-fonning tags; however, there are further constramts on the 
selection of word sequences. Generally, third strand association via Hoogsteen type of binding is most 
stable along homopyrimidine-homopurine tracks in a double stranded target. Usually, base triplets 
form in T-A*T or C-G*C motifs (where indicates Watson-Crick pairing and indicates 
Hoogsteen type of binding); however, other motifs arc also possible. For example, Hoogsteen base 
pairing permits parallel and antiparallel orientations between the third strand (the Hoogsteen strand) and 
the purine-rich strand of the duplex to which the third strand binds, depending on conditions and the 
composition of the strands. There is extensive guidance in the literature for selecting appropriate 
sequences, orientation, conditions, nucleoside type (e.g. whether ribose or deoxyribose nucleosides are 
employed), base modifications (e.g. methylated cytosine, and the like in order to maximize, or 
otherwise regulate, triplex stability as desired in particular embodiments, e.g. Roberts et al, Proc. Natl. 
Acad. Sci., 88: 9397-9401 (1991); Roberts et al., Science, 258: 1463-1466 (1992); Roberts et al., Proc. 
Natl. Acad. Sci., 93: 4320-4325 (1996); Distefeno et al., Proc. Natl, Acad. Sci., 90: 1 179-1 183 (1993); 
Mergny et al„ Biochemistry, 30: 9791-9798 (1991); Cheng et al., J. Am. Chem. Soc, 114: 4465-4474 
(1992); Beal and Dervan, Nucleic Acids Research, 20: 2773-2776 (1992); Bea! and Dervan, J. Am. 
Chem. Soc., 114: 4976-4982 (1992); Giovannangeli et al., Proc. Natl. Acad. Sci., 89: 8631-8635 

(1992) ; Moscr and Dervan, Science, 238: 645-650 (1987); McShan ct al., J. Biol. Chem., 267:5712- 
5721 (1992); Yoon ct al., Proc. Natl. Acad. Sci., 89: 3840-3844 (1992); Blume et al., Nucleic Acids 
Research, 20: 1777-1784 (1992); Thuong and Hclcnc, Angcw. Chem. Int. Ed. Engl. 32: 666-690 

(1993) ; Escude et al., Proc. Natl. Acad. Sci., 93: 4365-4369 (1996); and the like. 
Conditions for annealing single-stranded or duplex tags to their single-stranded or duplex 

complements are weW known, e.g. Ji et al., Anal. Chem. 65: 1323-1328 (1993); Cantor et al., U.S. 
patent 5,482,836; and the like. Use of triplex tags has die advantage of not requiring a "stripping" 
reaction with polymerase to expose the tag for annealing to its complement, if the tag was initially 
prepared in double-stranded form. 

Preferably, oligonucleotide tags of the invention employing triplex hybridization are double 
stranded DNA and the corresponding tag complements are single stranded. More preferably, 5- 
methylcytosine is used in place of cytosine in the tag complements in order to broaden the range of pH 
stability of the triplex formed between a tag and its complement Preferred conditions for forming 
triplexes are fully disclosed in the above references. Briefly, hybridization takes place in concentrated 
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salt solution, e.g. 1 .0 M NaCI, 1 .0 M potassium acetate, or the like, at pH below 5.5 (or 6.5 if 5- 
methylcytosine is employed). Hybridization temperature depends on the length and composition of the 
tag; however, for an 1 8-20-mer tag or longer, hybridization at room temperature is adequate. Washes 
may he conducted with less concentrated salt solutions, e.g. 10 mM sodium acetate, 100 mM MgCl2, 
5 pH 5.8, at room temperature. Tags may be eluted from their tag complements by incubation in a similar 
salt solution at pH 9.0. 

Minimally cross-hybridizing sets of oligonucleotide tags that form triplexes may be generated by 
^5 the computer program 3tagN in Appendix n of WO 97/46704, or similar programs. An exemplary set 

of double stranded 8-mer words is given below in capital letters with the corresponding complements in 
1 0 small letters. Each such word differs from each of the other words in the set by three base pairs. 

Table V 

20 Exemplary Minimally Cross-Hvbridizing Set of Double Stranded 8-mer Tags 



25 



30 



5' 


-AAGGAGAG 


5' 


-AAAGGGGA 


5' 


-AGAGAAGA 


5' 


-AGGGGGGG 


3' 


-TTCCTCTC 


3' 


-TTTCCCCT 


3' 


-TCTCTTCT 


3' 


-TCCCCCCC 


3' 


-ttcctctc 


3' 


-tttcccct 


3' 


-tctcztct 


3^ 


-tccccccc 


5' 


-AAAAAAAA 


5' 


-AAGAGAGA 


3' 


-AGGAAAAG 


5' 


-GAAAGGAG 


3' 




3' 


-TTCTCTCT 


3' 


-TCCTTTTC 


3' 


-CTTTCCTC 


3' 


-tttttttt 


3' 


-ttct:ctct 


3' 


-tccttttc 


3' 


-ctttcctc 


5' 


-AAAAAGGG 


5' 


-AGAAGAGG 


5' 


-AGGAAGGA 


5' 


-GAAGAAGG 


3' 


-TTTTTCCC 


3' 


-TCTTCTCC 


3' 


-TCCTTCCT 


3' 


-CTTCTTCC 


3' 


-tttttccc 


3' 


-tcttctcc 


3' 


-tccttcct 


3' 


-Cttcttcc 


5' 


-AAAGGAAG 


5' 


-AGAAGGAA 


5' 


-AGGGGAAA 


5' 


-GAAGAGAA 


3' 


-TTTCCTTC 


3' 


-TCTTCCTT 


3' 


-TCCCCTTT 


3' 


-CTTCTCTT 


3' 


-tttccttc 


3' 


-tcttcctt 


3' 


-tccccttt 


3' 


-cttctctt 
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Table VI 

Repertoire Size of Various Double Stranded Tags 
That Form Triplexes with Their Tag Complements 



40 



45 



50 





Nucleotide 










Difference 


Maximal Size 






'onuclcotide 


between 


ofMinimally 


Size of 


Size of 


Word 


Oligonucleotides 


Cross- 


Repertoire 


Repertoire with 


Length 


of Minimally 


Hybridizing 


with Four 


Five Words 


Cross-Hybridizing 
Set 


Set 


Words . 




4 


2 


8 


4096 


3.2x10* 


6 


3 . 


8 


4096 


3.2x10* 


8 


3 


16 


. 6.5x10* 


1.05x10* 


10 


5 


8 


4096 




15 


5 


92 






20 


6 


765 






20 


8 


92 . 






20 


10 


22 
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Preferably, repertoires of double stranded oligonucleotide tags of the invention contain at least 10 
members; more preferably, repertoires of such tags contain at least 100 members. Preferably, words are 
between 4 and 8 nucleotides in length for combinatorial ly synthesized double stranded oligonucleotide 
tags, and tag sequences for triplex formation are from 12 to 60 base pairs in length. More preferably, 
the tag sequences are from 1 8 to 40 base pairs in length. 

1>. .Solid Phase Linkers 

During synthesis, lag complements of the invention are bound to one or more solid phase supports 
by first and second types of linking moieties, such that (1) both types of linking moieties remain intact 
(uncleaved) during tag complement synthesis, and (2) a first type of linking moiety can be cleaved to 
release the synthesized oligonucleotide under conditions in which the second linking moiety (or 
moieties) is/are not cleaved. Preferably, the cleavable linkage is a chemically cleavable linkage, i.e., 
suitable for selective cleavage without using a cleaving enzyme. 

A large number of linking chemistries, and their cleavage properties, are known. Exemplary 
cleavable linkmg moieties, which are not intended to be limiting, include base-labile, acid-labile, 
photolabile, reducible, and enzyme-labile moieties. 

Base-labile linkages include esters (-C(O)O-) thioesters (-C(O)S-), and 2-oxyethyl sulfones (- 
OCH2CH2-S(=0)2R), for example. Such linkages can be cleaved under basic conditions, typically 
using a pH ^ 8.0, preferably pH ^ 9.0, for a suitable time, and optionally including a suitable 
nucleophile, such as ammonia, to increase pH and/or to promote cleavage by nucJeophilic attack. For 
example, a di(2-oxyethyl)sulfone linking moiety can be cleaved by reaction with ammonia (28-30% in 
water) under mild conditions (55°C for 12-15 hours) without significant harm to the oligonucleotide 
(Example 1). Synthesis of a support containing a mixture of succinate ester (cleavable) and 
succinamide (non-cleavabte) linking moieties is described in Example 3. 

Acid-labile linkages include esters (-C(O)O-), thioesters (-C(O)S-). ketals ((>C(R|R2>0-). 
hemikelals (0-CHR-0-), and phosphoramidates (-OP(^OXO')-N-) for example. For example, a 
phosphoramidate linkage can be introduced to a support by methods described in Hirschbein et al. (PCT 
App. PCT/US96/1041 8. Pub. No. WO 97/31009) or Gryazriov el al. (U.S. Patent No. 5,599,922), and 
can be cleaved by treatment with 0.8% trifluoroacetic acid in dichloromethane for 40 minutes at room 
temperature (sec also Gryaznov et al.. Nucleic Acids Res. 20:3403-3409 ( 1 992)). 

Exemplary photolabile linkers include 2-nitrbbenzyl esters, such as described by D.J. Yoo et al., 
Org. Chcm. 60:3358-3364 (1995); D.L. McMinn et al., Tetrahedron 52:3827-3840 (1996); and U.S. 
Patent No. 5,430,136 to Urdea. 

Reducible linkers are exemplified by disulfide linking groups which are readily reduced with 2- 
mercaptoethanol or dithiothreitol, for example (Gryaznov et al.; Nucleic Acids Res. 21 ;1403-1408 
( 1 993); U.S. Patent No. 5, 1 1 8,605 to Urdea). 

Enzyme-labile groups include any linking moiety that can be cleaved by an enzyme. Exemplary- 
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linking moieties include ester groups (-C(0>0-), peptide groups (-NH-C(0», thioesters (-C(O)-S), 
and polynucleotide sequences that contain endonuclease restriction sites (e.g., U.S. Patent No. 
4,775,61 9 to Urdea). In one preferred embodiment the linking moiety is an oligopeptide containing an 
amino acid residue that is recognized by a specific protease, e.g., lysine-X or arginiD&-X (trypsin); 
5 aspartate-X (Staphylococcus aureus V8 protease); or tyrosine-X, tryptophan-X, phenylalanine-X, 
leucine- X, or methionine-X (a-chymotrypsin); where X represents the next residue in the N-^C 
direction of the peptide. If cleavage of the oligopeptide linking group yields an oligonucleotide-peptide 
^5 conjugate, the peptide portion can be removed, if desired, using pronase or other suitable protease, or 

the oligonucleotide can include an endonuclease restriction site that is positioned to permit selective 
10 removal of the peptide portion from the oligonucleotide. 

Other chemical linkages are also contemplated, such as oligopeptides containing methionine, 
20 which are chemtcaUy cleavabte using cyanogen bromide under acidic conditions. Additional references 

which describe cteavable linking moieties include Monforte et al., International App. No. 
PCTAJS96/061 16 (WO 96/37630), Urdca (U.S. Patent 5,380,833), Wong, S.S., Chemistry of Protein 
1 5 Coniugation and Cross-Linking, CRC Press Boca Raton, FL, 199 1 (particularly pp. 63-67) and Allen, 
25 G., Sequencing of Proteins and Peptides, Ebevier Science Pub. B.V., New York, 1 983 (particularly 

Chapter 3), which are incorporated herein by reference 

Many stable linking moieties are also known that are generally resistant to the cleavage conditions 
mentioned above. Exemplary stable, **non-cleavable" moieties include ethers and polyethers, 
30 20 polyamines, phosphoramidites, phosphoramidates, and the like. These moieties are resistant under 

most acidic and basic conditions, in contrast to some of the groups noted above, and are therefore 
particularly convenient choices for the non-cleavable linkages used in the invention. 

Exemplary linkers for attaching and/or synthesizing tags on microparticle surfaces are described in 
35 Pon et al., Biotechniques, 6:768-775 (1988); Webb, U.S. patent 4,659,774; Barany et al., International 

25 patent application PCT/US91/06103 (WO 92/04384); Brown et al., J. Chem. Soc. Commun., 1989: 

891-893; Damhaet al., Nucleic Acids Research, 18: 3813-3821 (1990); Beattic et al., Clin. Chem., 39: 
719-722 (1993); Maskos and Southern, Nucleic Acids Research, 20: 1679-1684 (1992); and the like. 
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m. Repertoire Synthesis Method 
30 The present invention . provides a method for synthesizing a repertoire of oligonucleotide tags and a 

corresponding repertoire of tag complements, such that synthesis fidelity and yields of desired tags are 

improved, and problems caused by failure sequences are reduced, relative to previous methods in which 

tags and tag complements were synthesized separately. 

in one embodiment, the method entails (I) synthesizing a plurality of different-sequence 
35 oligonucleotide populations immobilized on one or more solid phase supports, wherein each population 

comprises a plurality of same-sequence oligonucleotides, and the oligonucleotides in each population 



55 



10 



wo 00^6411 PCT/US99/25680 

18 

comprise a tag complement sequence that is distinct relative to the tag complement sequences in the other 
populations, (2) cleaving a fraction of each oligonucJeotide population from the one or more supports, and 
(3) inserting the cleaved oligonucleotides into a plurality of cloning vectors to form a tag-vector library. 
Such a process is illustrated in Fig. 2, where tag-vector construct i, containing a cleaved tag complement 
5 sequence To, illustrates a member of a tag-vector library. 

The solid phase support(s) for use in the invention can be prepared by any suitable method in light of 
the description in section II above. As noted above, the linking moieties that link ±e tag complements to 
^5 the support comprise a mixture of at least two different linking moieties comprising a fust, cleavable 

linking moiety and a second, non-cleavable linking moiety. Both the cleavable and non-cleavable linking 
1 0 moieties are selected to withstand the oligonucleotide synthesis protocol that is used to build the tag 

complement sequences on the support The non-cleavable linking moiety must also be able to withstand 
20 conditions used later to hybridize tagged molecules to the support-bound tag complements. 

The ratio of clcavable:non-cleavable linking moieties on the support is selected to ensure that (1) a 
sufficient amount of the tag complement repertoire synthesized on the support can be cleaved from the 
1 5 support, for preparing a complementary tag repertoire, (2) the cleaved oligonucleotides are clonable or can 
25 readily be made clonable, and (3) a sufficient amount of tag complements remains on the support for 

capturing tagged molecules when the support is used for sorting. Preferably, the cleavable linking moieties 
constihite less than 50%, and more preferably, from about 1 0% to about 30% of total bound tag 
complements. It is also preferred that a selected fraction of each population of sjoithesized 
30 20 oligonucleotides is bound to the support or supports via base-cleavable linkages. 

A lai^e number of suitably reactive suppcHts are known, as discussed above, for attaching the desired 
linking moieties. For example, hydioxyl or amino-derivatized supports are commercially available or can 
be prepared from readily available materials, as discussed above, which are suitable for attaching selected 
^5 linker groups. 

25 The cleavable and non-cleavable linking groups can be attached to the support sequentially or 

simultaneously. Preferably, the linking moieties can be introduced onto a solid phase support by 
simultaneously reacting a mixture of cleavable and non-cleavable linkers with complementary reactive 
groups on the support. Thus, the relative proportions of cleavable and non-cleavable moieties on the 
support can be controlled by appropriate selection of the linker ratio for the derivatization reaction, taking 
30 into account the relative reactivities of the linkers with respect to the complementary reactive groups on the 
solid support. For example, when the cleavable and non-cleavable linkers are equally reactive toward the 
support, a cIeavabIe:non-cleavable linker ratio of 10:90 can be used to produce a derivatized support in 
which 10% of the attached tag complements are cleavable from the support. Similarly, a ratio of 30:70 can 
be used to produce a derivatized support in which 30% of the tag complements are cleavable. If necessary, 
35 the relative proportion of a less reactive linker can be increased to offset a higher reactivity of the other 
linker-type. 
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For example, in the protocol described in Example 1, glycidai methacrylate/ethylene dimethacrylate 
(95/5) crosslinked beads derivaiized with ethylene diamine were reacted simultaneously with a 10:90 molar 
ratio of cleavablemon-cleavable linkers, by standard phosphoramidite reaction chemistry. By this reaction, 
both linkers are attached to amino groups on the support to form phosphoramidite linkages which can later 
be oxidized to form stable phosphoramidate groups (Figure l). The cleavable linker^ which can be 
introduced on to the support as 2-P-(4,4*-dimethoxytrityloxy>-ethylsulfonyl]eihyl-(2-cyanoethylKNJ^^ 
diisopropyl)-phosphoramidite, contains a di(2-oxyethyl) sulfone moiety which is stable to acidic and 
oxidative conditions and remains intact during oligonucleotide synthesis, but is suscepdble to base- 
catalyzed cleavage as shown in Figure 1 . The non-cleavable linker in Fig. 1 , which can be added to the 
support as 9-O-dimethoxytrityl-triethyleneglycol, l-[(2-cyanoethyl)-N,N-diisopropyi)l-phosphor-amidite, 
comprises a polyethylene oxide linker that is stable to oligonucleotide synthesis conditions and to the basic 
conditions that can be used to cleave the cleavable linking moiety. 

Oligonucleotides containing the tag complements are added to the linker-derivatized support using 
any appropriate reaction conditions. A wide variety of synthesis strategies have been developed in the art, 
and have been described (e.g., Gait, 1990 (supra), and other synthesis references cited above). Currently, 
the solid-phase phosphoramidite method is preferred, since it provides coupling yields of about 99% or 
greater Preferably, synthesis is automated using any of a variety of available instruments and reagents to 
provide efficient synthesis. 

To facilitate the preparation of a tag repertoire that is complementary to the tag complements, the 
immobilized tag complement oligonucleotides preferably include first and second primer segments that 
flank the tag complement sequence in each oligonucleotide. Thus, following attachment of the link^ to 
the support, a first primer sequence is preferably added to the linker, typically by sequential addition of the 
appropriate monomers to the linkers. In the protocol described in Example 1 , a primer segment having the 
sequence 5'-TCCTTAATTAACTGGTCTCACTGTCGCA-3* (SEQIDNO:!) is added by sequential 
monomer addition in tlie 3' to 5* direction using phosphoramiditc-phosphotriester chemistry. 
Alternatively, a primer segment can be synthesized separately and can be attached to the linker in a 
single coupling reaction. Preferably, the linker and any attached primer sequence together provide a 
spacer moiety containing stretch of at least 1 0 chain atoms, to help separate the tag complement 
segment (discussed immediately below) from the surface of the support and from other tag 
complements on the support Preferably, the spacer (including an optional primer-binding sequence) 
has a length of from 10 to 30 nucleotides. The presence of a spacer moiety is particulariy useful to 
ensure that the tag complements will be readily accessible to hybridization with complementary tag 
sequences for sorting. 

The tag complement sequence in each oligonucleotide can be added by sequential addition of 
monomers or blocks of monomers, or by simultaneous attachment of a separately synthesized repenoire of 
tag complements, with sequential addition of monomers being preferred. Repertoires can be formed on a 



10 



wo 00/26411 PCT/US99/25680 

20 

planar array by photolithographic or robotic dispensing methods as are knovvn in the art (supra). For 
repertoires fonned on beads or particles, tag complement sequences are preferably fomied combinatoriaily 
using a split and mix approach, such as described by Brenner in PCX Publication No. WO 96/4101 1 . In 
brief, a plurality of different, minimally cross-hybridizing oligonucleotide "word" sequences are selected, 
5 which are designed to ensure that the synthesized tag complements will only be able to hybridize to tiieir 
corresponding complementary tags under selected hybridization conditions, i.e., so that the levels of any 
incorrectly hybridized tags and tag complements are insignificant 
^5 In the protocol described in Example 1 , each tag complement sequence consists of a linear string of 

eight ' Vords", each selected from the same set of eight tetrameric words. The primer-derivatized beads are 
1 0 divided into eight aliquots which are loaded into eight separate synthesis columns on an automated 

synthesizer. In each column, a different word selected from tlic eiglU possible words is added to the beads 
20 by sequential addition of appropriate monomers. After synthesis of the first word in eadi column is 

complete, the beads from the eight columns arc combined and gently mixed together to homogeneity. 
After mixing, the bead mixture is divided again into equal aliquots wiiich are loaded into the eight columns 
1 5 for another cycle of word addition. A total of eight cycles of word addition are performed, to produce 
25 beads with a total of 8* (= 1 .7 x 10^) different possible tag complement sequences. The split and mix 

protocol ensures that each bead contains a substantially imifonm population of same-sequence 
oligonucleotides, i.e., the oligonucleotides on a particular bead have substantially the same sequences. 
If necessary, the sequences of immobilized tag complements can be determined directly on the solid 
30 20 phase support by any known sequencing method. Preferably, the tag complements contain a universal 

primer-binding sequence on the 3 '-side of the tag complement sequence, so that the tag complements can 
be sequenced by the Sanger dideoxy sequencing method, by annealing and extmding a complementary 
sequencing primer. However, usually it is not necessary to know the sequences of the immobilized tag 
complements. 

25 Following addition of the tag complement sequence, additional nucleotide residues can be appended 

to incorporate a second priitier sequence, one or more endonuclease restriction sites, and/or any other 
desired features. For example, in tlie protocol of Example 1, a hexameric block of GGGCCC is appended 
to the tag complement to ( 1 ) help anchor the tag complement to its corresponding complementary tag 
during hybridization, (2) define the end of the tag complement segment, and (3) introduce a Bspl20I 
30 restriction site for subsequent ligation to a vector. This is followed by addition of a second primer 
sequence (primer 2) for optional amplification as discussed further below. 

Thus, in a preferred embodiment, the immobilized oligonucleotides that contain the tag complement 
sequences include the following features: 
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wherein the linking moiety is cither cleavable or non-cleavable as discussed above, the spacer segment 
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(preferably having a length of 1 0 to 30 nucleotides) optionally contains (1) an endonuclease restriction site 
and/or (2) a universal primer-binding sequence, the tag complement segment (preferably having a length of 
12 to 60, 1 8 to 40, or 25 to 40 nucleotides) has the features discussed above, and the tenninal segment 
(preferably having a length of 0-40 nucleotides, or 10 to 30 nucleotides) optionally contains (1) an 
endonuclease restriction site which may be the same or different from the first restriction site and (2) a 
universal primer sequence which may be the same or different from the complement of any primer-binding 
sequence in the spacer in one prefeired embodiment, the tag complement sequence in each 
oligonucleotide is flanked by first and second endonuclease restriction sites which may be the same or 
different, to facilitate subsequent ligation into a selected vector. 

After oligonucleotide synthesis is complete, the oligonucleotides are deprotected if necessary, and a 
portion of the tag complement repertoire is cleaved from the support(s) using appropriate reaction 
conditions to selectively cleave only the cleavable linking moeities, while leaving the non-cleavable linking 
moieties intact. 

Conveniently, the cleavage step and any deprotection steps are perfomned simultaneously using a 
single reagent. For example, if the cleavable linking moiety and one or more protecting groups are base- 
labile, then the support can be treated with base to remove the protecting groups and cleave the cleavable 
oligonucleotides from the support. Of course, the deprotection and cleavage steps can also be performed 
separately. 

The cleaved oligonucleotide mixture can be inserted directly into a seleaed cloning vector, as 
discussed further below. However, the mixture is preferably purified to remove failure sequences and other 
extraneous materials, using any convenient method. For example, the mixture can be rapidly and 
conveniently purified by HPLC by ion exchange or reverse-phase chromatography by known methods. 
Electrophoretic separations can also be used. For reverse-phase chromatography, the synthesized 
oligonucleotides preferably contain a hydrophobic group, such as trityl or dimethoxytrityl (DMT), to 
facilitate preferential adsorption of tritylated oligonucleotides on the adsorbent while non-tritylated 
sequences (arising from failed coupling steps) are substantially unretained by the column. Typically, the 
desired oligonucleotide mixture elutes (or migrates) as a broad peak since a vast spectrum of sequences 
with different G/C content and distribution are present. The mixture may be further purified by ethanol 
extraction or the like, and the resultant mixture can be stored dry or in a stable storage medium, prefCTably 
below 0**C. The integrity of the sequences of the oligonucleotides can be evaluated by randomly 
sequencing individual members of the mixture (usually after the mixture has been inserted into a cloning 
vector). 

For subsequent attachment to sample polynucleotides of interest, the cleaved oligonucleotides are 
preferably inserted into one or more cloning vectors to foim a tag vector library. Any type of replicable 
vector can be used, such as a plasmid, phagemkJ, cosmid. or the like, which can be propagated in an 
appropriate host. The vector can be single-stranded or double-stranded. A variet>' of suitable vectors are 
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available from commercial sources (e.g., New England Biolabs^ Clontech, GibcoBRL, etc) or have been 
described in the literature. The vector will generally contain one or more restriction sites for inserting 
foreign DNA, at least one selection marker (e.g., ampicillin resistance), optionally a universal primer 
sequence upstream of the insertion site (e.g., for sequencing the insert), and any necessary components for 
5 propagating the vector in an appropriate host Preferably, the vector includes a first restriction site for 
inserting an oligonucleotide tag, and a second restriction site inunediately adjacent the firsl site, fur 
insCTting a sample fragment in close proximity to a tag. Alternatively, the restriction site for inserting the 
sample fragment can be introduced into the vector via the tag oligonucleotide, which contains an 
endonuclease site immediately adjacent to the tag segment 
1 0 For insertion into a double-stranded vector, the cleaved oligonucleotides are preferably converted to 

double-stranded form by annealing a complementary primer to a universal 3'-primer segment in the 
2Q oligonucleotides and extending the primer by any suitable means. The double stranded product contains a 

tag sequence that is perfectly base-paired vAth a conesponding tag complement sequence; Typically, the 
annealed primer is extended in the 3' direction using a DNA polymerase in the presence of a mixture of 
1 5 dNTPs. To increase the quantity of double stranded oligonucleotide for insertion into the vector, the 
25 oligonucleotide mixture may be amplified by polymerase chain reaction using a pair of primers that anneal 

to primer-binding sequences that flank the tag complement sequences, until the desired amount of double- 
.<nranded product is obtained. 

The double-stranded tag oligonucleotide mixture can be inserted into a vector by standard ligation 
30 20 techniques, e.g., by blunt end or sticky end ligation. Typically, the vector and oligonucleotide mixture are 

each treated separately with one or more restriction enzymes to produce products that have cohesive ends. 
Preferably, the vector and oligo mixture are each treated separately wnth two difFerent restriction enzymes 
to produce products having two different sticky ends, to help reduce religation of the vector without 
35 incorporating an insert The cleaved vector can also be treated with calf intestinal phosphatase to inhibit 

25 self-ligation of vector without insert. For example, in Example 2D, the oligonucleotide mixture and vector 
are each treated with Pad and Bspl20L The larger firagment from the cleaved vector is treated with calf 
intestinal phosphatase and then is reacted with the restricted oligonucleotide mixture in the presence of a 
^ ligase, to fomi a mixture of tag vectors. 

The ligation product can then be used to transform a suitable host to create a tag- vector library. 
30 The transformants are cultured until the desired number of transformants have been collected. 

Generally, the number of transformants should exceed the complexity of the tag repertoire to ensure 
sampling of a high proportion of the members of the repertoire. Preferably, the amount of 
transformants are sufficient to ensure the presence of at least 90% of the repertoire in the tag-vector 
library, more preferably at least 95%, and more preferably greater than 99%. 
35 Tag- vector libraries of the invention are useful for simultaneously and uniquely tagging large 

numbers of sample oligonucleotide fragments, so that the tagged oligonucleotides can be sorted onto an 
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amy of tag complements for further processing or analysis. Thus, the tag-vectors can be used to form a 
library of tagged polynucleotide fragments, by combining a tag-vector library with a mixture of 
different polynucleotide fragments under ligation conditions effective to insert the different fragments 
at a site adjacent to a tag-sequence in each vector, so that each sample fragment (So in Fig. 2) becomes 
5 associated with a different tag (TJ. One member of such a tag-vector library is represented by 

construct 2 in Fig. 2. Preferably, the complexity of the tag- vector library exceeds the complexhy of the 
sample, to help reduce the possibility of the same tag becoming associated with two or more different 
sample fragments. 

For example, in Example 2D, the lag-vector library pBP9 was digested with Bbsl and BamHI to 
1 0 create a large vector fragment having a tag sequence adjacent to the Bbsl site (within 20 nucleotides). 
The vector fragment was treated with phosphatase to remove 5' -phosphate groups, and an 
20 oligonucleotide sample mixture (cDNA) having BbsI/BamHI sticky ends was inserted by ligation into 

the vector. The ligation mixture was used to transform E. coli host cells, and aliquots comprising 
approximately 1.6 x lO' clones were used to inoculate liquid cultures In order to expand the library. 
1 5 Plasm id DNA was extracted from these cuhures and stored. 
25 Tagged polynucleotides from the tagged- sample oligonucleotides in the library can be processed or 

analyzed by any desired method. To facilitate detection, the tagged polynucleotides can be amplified 
by PCR using a universal primer pair for co-amplification, followed by optional purification of the 
desired products from template and amplification reagents (Example 2E). If the tags are to be used in 
30 20 single-stranded form, for binding single or double-stranded tag complements, die tag strands can be 

separated from their complementary strands by denaturaiion and collection of tlie desired tag strands, 
e.g., by solid phase capture of biotinylated tag strands on a strcptavidin support. Alternatively, the 
complementary strand can be removed using an exonuclease with 5*-»3* or 3'->5' exonucleasc activity 
^5 in the presence of a selected dNTP that is not contained in the tag complement region, so that the 

25 exonuclease efTcctively slops at the end of tlie tag complement For example, if the tag sequence 

contains A, C and T bases, but not G, then the complementary sequence contains T, G and A but not C. 
Use of an exonucleasc in the presence of CTP is effective to remove the complementary sequence until 
the first C is reached in the complementary sequence. 

The tagged oligonucleotides (also referred to as tagged oligonucleotide fragments) are hybridized 
30 to the immobilized tag complement repertoire under suitable hybridization conditions as discussed 

above or as illustrated in Example 2E. If the solid support comprises particles or beads, the particles or 
beads can be incubated in the presence of the tagged oligonucleotides with constant agitation to 
facilitate hybridization, followed by a washing step. If only a small proportion of the beads are 
hybridized with complementary tags, the hybridized beads can be separated from non-hybridized beads 
35 by any suitable method, e.g., by FACS (fluorescence activated cell sorting). For this purpose, the 

tagged polynucleotides preferably include a detectable label, such as a fluorescent label (e.g., via a PCR 
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primer) to facilitate detection of hybridized tagged polynucleotides. 

After hybridization to the immobilized tag complement repertoire, the tagged polynucleotides can 
be analyzed according to any suitable method. For example, immobilized tagged polynucleotides can 
be sequenced, e.g., using the immobilized polynucleotide as templates to generate labeled extension 
products by cycle sequencing, e.g., as taught by Brenner, International App. No. PCTAJS95/12678 
(WO 96/12039) and International App. No.PCT/US96/03678 (WO 95/27080). Alternatively, 
hybridized polynucleotides can be denatured from the suppon and sequenced according to known 
solution phase methods. 

The invention is also useful for sorting different cDNAs onto a solid support, for sequence analysis 
or quantification. For example, the tags can be used to conduct differential display experiments to 
characterize changes in transcript levels from different samples, e.g., for characterization of different 
cell and tissue-types, disease conditions, etc. The invention can also be used for determining the 
toxicity of various compounds as disclosed in PCT App. No. PCT/US96/1 6342 (WO 97/13877). 

If the tagged molecules are polypeptides, the immobilized polypeptides can be further 
characterized, e.g., by treatment with target specific antibodies to identify specific polypeptide species. 
Synthesis of tag-peptide conjugates is described, for example, in WO 96/12014 at page 14. 

Additional uses for the tag and tag-complement repertoires of tiie invention are disclosed, for 
example, in WO 95/27080; WO 96/12014; WO 96/12039; WO 96/13877; WO 96/4101 1 ; and WO 
97/46704, which are incorporated herein by reference, including: tracking, identifying, and/or sorting 
classes or subpopulations of molecules, DNA sequencing, mRNA fingerprinting, etc. 

The invention may be further understood in light of the following examples, which are not 
intended in any way to limit the scope of the invention. 

Example 1 
Synthesis of Tag Complements 

All commercial chemicals were of synthesis quality and were used without further purification. 
Oligonucleotide syntheses were performed using a GENE ASSEMBLER PLUS-SPEQAL /4 
PRIMERS synthesizer (Pharmacia) usmg 5.6 pro amino-derivatized polymeric beads (95% glycidal 
methacrylate/5% ethylene dimethacrylate derivatizcd with ethylene diamine, from Bangs Laboratories, 
Inc., Fishers, Indiana). The beads were estimated to have an amino group dcnsity of about 10^ amino 
groups per bead. In addition to the 25 \im polypropylene filters supplied with the synthesis columns 
from Pharmacia, a GFA glass filter mat (Wliatman) was added to the end of each synthesis column to 
improve retention of the beads. Alternatively, synthesis columns were constructed out of stainless steel 
tubing (1-5 mm length x 0.75 inch diameter) having screw-cap ends to hold a 2 ^m pore size PEEK 
filter at each end to retain the beads (such a column with a 5 mm length has a capacity of about 600 mg 
of beads). 



10 



wo 00/2641 1 PCT/US99/25680 

25 

4.8 g of amino-derivatized beads were packed into each of 8 synthesis columns, each containing 
600 mg of beads. To form the cleavable linker moieties, 250 mg of 5'-PH0SPHATE-ON reagent (2- 
[2-(4.4^dimethoxytrityIoxy)-ethylsu!fonyl]ethyl-{2-cyano-^hyIHN.N-diisopro^ 
from Clontech, catalog no. 5210-2) was dissolved in 3.8 mL of anhydrous acetonitrile to give a 
5 concentration of 0.1 M. To form the non-cleavable linker moieties, 250 mg of SPACER 

PHOSPHORAMlD]TB(9-0-dimethoxytrit>'l-triethyleneglycol, l-[(2-cyanocthyl)-N.N-diisopropyl)l- 
phosphoramidite, from Clontech. catalog no. 5260-3) was separately dissolved in 3.8 mL anhydrous 
^5 acetonitrile to give a fmal concentration of 0. 1 M. These two solutions were combined in a ratio to 

provide any desired % cteavability. For example, to achieve 10% cleavabilily, 1 part PHOSPHATE- 
1 0 ON solution (e.g., 200 ^iL) was combined with 9 pans of SPACER PHOSPHOR AMIDITE (e.g., 1 800 
fil.) in a dry bottle, which was then attached to the extra phosphoramidite port on the GENE 
20 ASSEMBLER SPECIAL PRIMERS 4 instniment. AU transfers were carried out with a syringe, and 

the bottles were blanketed with argon to minhnize exposure to water vapor in air. 

Following addition of the linkers (cleavable and non-cleavable) to the beads, the DNA sequence 
1 5 below (primer 1, SEQ ID NO: 1) synthesized on the beads in the 3' to 5' direction using the phosphite 
25 triester (phosphoramidite) method (M.J. Gait, Editor, Oligonucleotide Synthesis, a Practical Approach, 

IRL Press, London, UK, 1990, particularly Chapter 3), using the following protected monomers: T 
(unprotected), A and C (benzoyl), and G (isobutyryl): 

30 20 5'-TCCTTAATTAACTGGTCTCACTGTCGCA-3' (primer 1, SEQ ID N0:1) 

Next, a different "word" sequence was appended to the primer template sequence in each column 
by phosphoramidite synthesis, by sequential addition of the appropriate monomers. The sequences of 
22 the 8 words were: 

25 1) 5'-CTrr 

2) 5=-TACT 

3) 5'-ACAT 

4) 5'-AATC 

5) 5^TTAC 

^ 30 6) 5'-TCTA 

7) 5"-ATCA 

8) 5'-CAAA 

Following completion of the first word addition step, the beads were collected from the columns, 
45 35 mixed together, and reloaded into the columns, after which each different "word" sequence was added 

separately again in the 8 columns by phosphoramidite synthesis as above, followed by bead colleaion, 
remixing, and column re-loading. This cycle of adding words and mixing/splitting beads was 
performed a total of 8 times to create an oligonucleotide library of about 1.7 x lO' (= 8*) different 
50 oligonucleotide sequences. After this combinatorial synthesis was complete, the sequence 5*-CCC-3' 

40 (trityl-on) was added to the beads, resulting in a mixture of immobilized oligonucleotides having the 



55 



10 



20 



25 



40 



45 



WO 00A2641 1 PCT/US99A25680 

26 

following general fonmula (SEQ ID K0:2) 

5' -CCC-[word)8-TCClTAATTAACTGGTCTCACTGTCGCA-3' - Support 

Pad 



Following addition of the CCC segment, a portion of the beads (43 g = 90%) was depmtected 
with saturated aqueous ammonia (2 mL, 28-30% in water) at 55°C for 12 hours. The deprotected beads 
were then treated with acetic acid for 15 minutes to remove the terminal 5*-trityI group. After the acid 
1 0 was removed, the beads were washed with methanol and concentrated in vacuo. The beads were then 
ready for hybridization. 

The remaining portion of the beads (0.5 g = 10%) were reacted with appropriate phosphoramidite 
monomers to add the sequence 5*-CCTTAGGG-3* (primer 2, with trityl on, SEQ ID N0:3) followed by 
deprotection with ammonia (2 mL) at 55**C for 12 hours. The ammonia treatm«it vras also effective to 
1 5 cleave the cleavable linkers to release 1 0% of the oligonucleotides from the beads. The beads were 

removed by centrifugation, and the remaining ammonia solution was concentrated by vacuum to half of 
the original volume. The cleaved oligonucleotides in this mixture had tlie general formula (SEQ ID 
N0:4); 



20 5' -CCTTAGGGCCC- [word] 8-TCCTTAATTAACTGGTCTCACTGTCGCA-OH-3' 
30 ^ 

Pad 

The oligonucleotide mixture was purified by HPLC chromatography using a 150 mm x 4.6 mm 
PLRP-S column from Polymer Laboratories (100 Angstrom pore size, 8 ^m particles) on a Gilson 

35 

25 HPLC instrument. A 2-stage solvent gradient was used: Buffer A: 0.2 M triethyl ammonium acetate 
and 2% acetonitrile in water; Buffer B: 1 00% acetonitrile; flow rate = 1 mL/min at room temperature; 
gradient program: 95% A for 1 minute; reduce to 80% A over 5 minutes and hold at 50% A for 5 
minutes; return to 95% A over 5 minutes and maintain at 95% A for 10 minutes to re-equilibrate 
column. The entire, broad trityl-on peak was collected between 1 8 minutes and 20 minutes. The 
30 collected fractions were concentrated in vacuo, treated with glacial acetic acid for 1 5 mmutes to remove 
the 5*-trityl groups, and concentrated in vacuo to dryness. The residue was dissolved in 300 pL water 
and transferred to a 1 .5 mL microfuge tube. To this was added 1 00 fiL of 4 M NaCl, and the mijcture 
was vortexed. After addition 1 mL ethanol, the sample was vortexed and then chilled -20°C for 20 
minutes. The precipitated DNA was centrifuged, and the supernatant was removed. The precipitation 
35 procedure was repeated twice more. The final purified mixture was designated BP9. 

50 
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Example 2 
Cloning and Preparation of Tags 
A. Plasmid Cloning Vector . A plasm id cloning vector pLCV2 was created from plasmid vector 
p6C.SK' (Stratagene) as follows, using the follov/ing oligonucleotides: 

5 

S-723 fSEO ID NQ:5^ 

5'-CGA GAA AGA GGG ATA AGG CTC GAG CTT AAT TAA GAG TCG ACG AAT TCG 
GGC CCG GAT OCT GAC TCT TTC TCC CT-3' 

10 S-724rSEO IDNO:61 

5'-CTA GAG GGA GAA AGA GTC AGG ATC CGG GCC CGA ATT CGT CGA CTC TTA 
ATT AAG CTC GAG CCT TAT CCC TCT TTC TCG GTA C-3' 

S-785 fSEOTDNO:?) 

15 5' -TCG AGG CAT AAG TCT TCG AAT TCC ATC ACA CTG GGA AGA CAA CGT AG-3' 

25 S-786fSEOIDNO:81 

5' -GAT CCT ACG TTG TCT TCC CAG TGT GAT GGA ATT CGA AGA CTT ATG CC-3' 
S-96Q(S£OrDNO:9) 

20 5' -TCG ATT AAT TAA CAA GCT TTG GGC CCT CGA GCA TAA GTC TTC TGC AGA 
30 ATT CGG ATC CAT CGA TGG TCA TAG C-3' 

S-961 (SEP ID NO: 10) 

5' -TGT TTC CTG CCA CAC AAC ATA CGA GCC GGA AGC GGC CGC TCT AGA- 3' 
S-962 fSEOTDNO:in 

5' -AGC GTC TAG AGC GGC CGC TTC CGG CTC GTA TGT TGT GTG GCA GGA AAC 
AGC TAT GAC CAT C-3' 

40 30 S-963 (SEO ID NO: 121 

5' -GAT GGA TCC GAA TTC TGC AGA AGA CTT ATG CTC GAG GGC CCA AAG CTT 
GTT AAT TAA- 3' 

45 S-1105 (^SEO ID NO: 131 

35 5' -TCGA GGG CCC GCA TAA GTC TTC- 3' 



35 



25 



50 



S-1106fSEO ID NO: 141 

S'-TCGA GAA GAC TTA TGC GGG CCC-3' 
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Oligos S-723 and S-724 were kinased, annealed together, and ligated to pBC.SK' which had been 
digested with KprI and Xbai and treated with calf intestinal alkaline phosphatase, to create plasm id 
pSW143.1. 

Oligos S-785 and S-786 were kinased, annealed together, and ligated to plasmid pSW143.1, which 
5 had been digested with Xhol and BamHI and treated with calf intestinal alkaline phosphatase, to create 
plasmid pSW164.02. 

Oligonucleotides S-960, S-961, S-962. and S-963 were kinased and annealed together to fonn a 
15 duplex consisting of the four oligonucleotides. Plasmid pSWl 64.02 was digested with Xhol and Sapl. 

The digested DNA was electrophoresed in an agarose gel, and tlie approximately 3045 bp product was 
10 purified from the appropriate gel slice. Plasmid pUC4K (from Pharmacia) was digested with PstI and 
electrophoresed in an agarose gel. The approx. 1240 bp product was purified from the appropriate gel 
20 slice. The two plasmid products (from pSWl 64.02 and pUC4K) were ligated together with the S- 

960/961/962/963 duplex to create plasmid pLC V I . 

DNA from AdenovirusS (New England Biolabs) was digested with Pad and B^120I, treated with 
15 calf intestinal alkaline phosphatase, and electrophoresed in an agarose gel. The approx. 2853 bp 
25 product was purified from the appropriate gel slice. This fragment was ligated to plasmid pLCV I 

which had been digested with Pad and Bspl20I, to create plasmid pSW208J4. 

Plasmid pSW208.14 was digested with Xhol, treated with calf intestinal alkaline phosphatase, and 
electrophoresed in an agarose gel. The approx. 5374 bp product was purified from the appropriate gel 
30 20 slice. This fragment was ligated to oligonucleotides S-1 1 05 and S- 1 1 06 (which had been kinased and 

annealed together) to produce plasmid pLCV2, which contained the following elements (SEQ ID 
NO:15) 

35 5' -...TTAATTAAGGA[tag] GGGCCCGCATAAGTCTTC [staffer] GGATCC ...-3' 

25 3 ' -.. AATTAATT CCT [ t ag ] CCCGGGCGTATTCAGAAG [ s t u f f e r ] CCTAGG_.„-5 ' 

Pad Bbsl BaroHI 



40 



45 



B. Construction of Tag Plasmid Library . A tag-plasmid library designated pBP9 was created as 
30 follows. A primer designated PEP-1 was synthesized having the sequence 

5 *-Fam-TGTGGCTTAATTAAGG (PEP- 1 , SEQ ID NO: 1 6) 



(Fam-TTP conjugate obtained from Perkin Elmer- Applied Biosystems Division). This PEP-1 primer 
35 was annealed to the combinatorial tag mixture from Example 1 (designated BP9), and the primer was 
extended by treatment with Sequenase DNA polymerase in the presence of the four standard dNTPs. 
The double-stranded product was purified, digested with Pad and Bspl201, and ligated to the larger 
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fragment produced from pLCV2 by digestion with Pad and Bspl20I and removal of 5'-phosphate 
groups with calf intestinaJ alkaline phosphatase. The ligation product was electrophoresed in an 
agarose gel and purified from the gel slice using a QIAEX II kit (Qiagen). This ligated material was 
used to transform electrocompeient E. coli TOPIOF* (Invitrogen) or XLlBlueMRF' (Stratagene). 
Ligation and transfonnation conditions were optimized by conducting preliminaiy ligation reactions 
using 1 ^iL vector fragment (~ 1 0 ^ig), and either no insert, 1 ^iL insert diluted 1:10, or I ^ non-diluted 
insert (~ 1 pmol), using a RAPID LIGATION KIT (Boehringer Mannheim). After ligation, the ligation 
^5 mixture was used to transform electrocompetent cells, and transformante were plated ( 1 00 \lL neat, 1 00 

^L 1 : 1 0 dilution, and 1 00 pL 1 : 100 dilution). Using this approach, the best vector/insert ratio was 
10 determined as the ratio that produced the highest number of colonies per initial DNA while maintaining 
a very low background. For library synthesis, the ligation conditions were scaled up as necessary, 
20 Exemplary conditions are as follows: 

200 ^L vector (total obtained starting from 20 ^g plasmid) 

15 X pL insert (Pacl/Bsp 1 201 digest) 

SO ^L I OX ligation buffer 

25 50 ^tL lOmMATP 

y \iL water 

1 0 nL T4 DNA ligase (2,000,000 U/mL) 



35 



40 



20 



final volume = 500 pL 



30 The ligation mixture was incubated overnight at I6°C, followed by extraction twice with one 

volume (500 |xL) TE-saturated phenol/chlorofonn/isoamyl alcohol (25:24: 1), once with chlorofonn, 
25 followed by ethanol precipitation. After centrifugation^ washing, and drying, the pellet was 

resuspended in 50 ^L water. Into each of several vials of electrocompetent cells (80-100 ^L cells) were 
added 1 to 2 ^L of ligation product The following conditions were used for electroporation: 200 
ohms, 25 jiF, 1800 V in a chilled O.I cm electroporation cuvette. Immediately after electroporation, 1 
mL SOC broth (Sambrook ct al., supra., at Appendix A.2 under SOC Medium) was added (room 
30 temperature), and the mixture was transferred to a 1 5 niL Falcon tube and shaken at 220 rpm at 37**C 
for 1 hour. Each transformation mixture was then added to 50 mL pre warmed broth (TB or LB + 
chloramphenicol) m a shaking flask. To determine titers, small aliquotsof each transformation (5 ^L 
neat or diluted) were plated on LB agar plates containing 30 Mg/mL chloramphenicol to determine titer, 
45 while the remainder was inoculated into a liquid culture and shaken in an incubator overnight 

35 Transformations were repeated until the total number of independent transfoimanls (clones) was 

estimated to be approximately 1 .7 x 10* based on dilution analyses. Tliis amount of transformants is 
equal to 10-fold the amount of potential tag variants produced by random combinatorial linkage of eight 
50 different tetramerlc 'Svord' sequences to produce tag sequences having a total length of eight words (8* 

= 1.7 X 10'). Plasmid DNA (pBP9) was extracted from Ihe cultures and pooled. Note that cultures 
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could be combined before plasm id isolation only if they had similar titers. Otherwise, plasmids were 
isolated separately and then mixed together proportionately to their titers. Approximately 1 .5 mg pBP9 
was obtained. 



5 C. PoM AVRNA Isolation 

Poly(Af RNA (approx. 30 ^ig from 5 x 10^ cells) was extracted from THP-I cells using a FastTrack kit 
(Invitrogen) as recommended by the supplier. Poly(A)"RNA (approx. 15 ^g fit)m 5x10^ cells) was 
extracted from yeast cells using a STRAIGHT A's raRNA isolation system from Novagen. For cDNA 
synthesis, an ethanol suspension of polyA RNA was centrifuged in a cold microfuge at full speed for 30 
10 minutes. The supernatant was removed and discarded, after which the pellet was recentrifiiged briefly, 
and the supernatant was carefully removed by pipette. The sides of the tube were washed carefully 
with 0.75 mL of 70% ethanol, followed by centrifugation for 20 minutes. The supernatant was 
removed as before, to ensure that all liquid had been removed. The RNA pellet was resuspended in 
DEPC-treated water (32 ^iL was used for mRNA from 5 x 10^ mammalian cells; otherwise, the volume 
15 was adjusted accordingly). The resultant solution was spun to the bottom of the tube and placed on ice. 
25 RNA concentration was estimated by dilution in water, assuming 1 .0 absorbance units at 260 nm 

(A260) = 40 jig/mL RNA (the A26O/A280 ratio should be 2.0). Gloves were worn at all times to avoid 
nuclease contamination. 



20 D. cDNA Library Construction 

First- and second-strand cDNA was synthesized using a cDNA Synthesis Kit (Straiagene) as 
recommended by the supplier, using KNAse-free tubes, except that the following custom primer was 
used for first strand synthesis (RT primer, SEQ ID NO: 1 7): 

25 P2 : 5' -biOtin-GACATGCTGCATTGAGACGATTCTTTTTTTTTTTTTTTTTTV 

BsmBI 

where "V" represents a mixture of A, C, and C. In brief, the following solutions were added to a sterile 
30 0.5 mL tube: 

2.25 ^L I OX first strand buffer 
45 1 .3 5 ^iL fu^si strand methyl nucleotide mixture 

1.0 jiL RT primer (50 pmol/|iL) 
35 n |iL DEPC-treated water (n calculated as below) 

0-45 ^L RNase block ribonuclease inhibitor (40 U/pL) 



50 



The resultant mixture was flicked to mix the components and then was spim to the bottom of the tube. 
To the mixture was added 2.5 \ig poly(A)'^RNA from above (m ^tL), where m is the volume containing 
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2.5 fig RNA, and n (the volume of DEPC-treated water) was calculated to produce a final volume of 
21 .8 fxL (n+m = 1 6.8 |iL). To the mixture was added 0.7 MMLV reverse transcriptase (50 U/fiL). 
The mixture was gently vortexed or flicked, and then was spun to the bottom of the tube. The mixture 
was incubated at 37^C for 60 minutes in a heat block or PCR machine, and then placed on ice. 
5 For second strand cDNA synthesis, the first-strand cDNA product (22.5 \iL) was placed on ice, 

and the following components were added in order 

lg 100 |xL lOX second-strand buffer 

3 \xL second-strand nucleotide mixture 

1 0 57.0 |iL sterile water (not DEPC-treated) 

1 .0 pL a-''P-d ATP ( 1 0 |iCi/nL, not more than 2 weeks old) 



10 



20 

15 

25 



30 



The mixture was flicked to mix the solution and then spun down briefly, followed by equilibration on 
ice for 5 minutes to reduce the number of hairpin structures. Next, the following enzymes were added: 

1.0fiLRNAseH(l-5U/^L) 

5.5 fiL DNA polymerase I (9.0 U/fiL) 



The resultant solution was mixed and spun briefly, without allowing the temperature to exceed 16*'C. 
20 The mixture was incubated at 1 6**C for 1 50 minutes (but not a longer time), after which it was placed 
on ice. 

To the reaction tube was added 100 jiL TE-saturated phenol/chloroform/isoamyl alcohol (25:24:1), 
and the mixture was vortexed thoroughly. The mixture was centrifuged for 5 minutes. The aqueous 
(upper) layer was carefully removed and transferred to a new tube without collecting any of the 
25 interface. 1 00 fiL of chloroform :isoamyi alcohol (24:1 ) was added, and the mixture was vortexed 
2^ thoroughly. The mixture was centrifuged for 5 minutes. The aqueous layer was collected and 

transferred to a new tube as before, and 1 ^L of 5 M NaCI was added and vortexed. 

The resultant double- stranded hemimethylated cDNA was sizc-fractionatcd on a Pharmacia 
SIZESEP 400 column (Cat. No. 27-5 1 05). The column was equilibrated in STE buffer (1 X STE = 100 
40 30 mM NaCI, 20 mM Tris pH 7.5, 1 0 mM EDTA) by placing the SIZESEP gel m a spin column and 

allowing the buffer to drain out until the top of the gel bed was reached, followed by suspension of the 
gel bed in 2 mL TBE and allowing the buffer to drain to the top of the gel bed, and repeating the TBE 
suspension and draining steps two more times. The column was capped at both ends until use, to 
45 prevent drying. Immediately before use, the caps were removed, and the column was spun briefly at 

35 about 400 x g for 2 minutes to compact the gel and remove excess buffer. Next, the cDNA sample was 
applied carefully and slowly to the center of the flat surface at the top of the gel bed, so that the sample 
did not flow past the sides of the gel or through any cracks. The column was placed into a Falcon 2059 
50 tube so that the lower tip of the column was placed over the top of a cap-less 1 .5 mL microfuge tube to 
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collect column effluent The column was spun in the Falcon tube at about 400 x g for 2 minute, so that 
synthesis primer and small cDNAs were retained in the column, while cDNAs of the desired length 
collected in the microfuge tube. The collected cDNA mixture was transferred to a new, capped tube. 
To the collected cDNA was added (optionally) I iil- PELLET PATNT (Novagen, Madison, WIX 0.1 
5 volume of 3 M NaOAc, and 2.5 volumes EtOR The resultant solution was stored at -lO'^C for at least 
30 minutes or -20'*C overnight. The double-stranded cDNA product had the following formula (SEQ 
ID NO: 18): 

5' -biot-GACATGCTGCATTGAGACGATTCTTTTTTTTTTTTTTTTTTVXXX ... XGATCXXX 
10 CTGTACGACGTA ACTCTGC TAAGAAAAAAAAAAAAAAAAAABXXX ... XCTAGXXX 

BsmBI DpnII 



where the X's mdicate nucleotides in the cDNAs, V represents A, C or G. and B represents C, G or T. 
15 Note that the RT primer sequence was selected to give a BsmBI site m the cDNAs which results in a 5'- 
GCAT overhang upon digestion with BsmBI. 

AAer spinning the ethanol/NaOAc/cDNA solution for 30 minutes in a cold microfuge and washing 
with 70% ethanol as before, the size-fractionated cDNA was digested to completion with Dpnll by 
resuspending the cDNA pellet in a solution of 170 pL water, 20 jiL lOX DpnII buffer, and 10 iiL DpnII 
20 (lOU/^L), followed by incubation at 37'*C for 2 hours. 

The biotinylated fragments were purified using Dynal M-280 streptavidin beads. In preparation 
for biotin capture, the magnetic streptavidin beads were resuspended by vortexing, and 100 ^iL bead (I 
mg) was transferred to a fresh 1.5 mL microfuge tube. The tube was placed in a magnetic particle 
35 concentrator (MPC) far at least 60 seconds to allow the beads to pellet near the magnet. The 

25 supernatant was removed and discarded while the tube was in the MPC. To the lube was added 100 
1 X B&W buffer (2X B&W buffer = 20 mM Tris-HCI pH 8.0. 2 mM EDTA, and 2 M NaCl), and the 
suspension was vortexed, placed in the MPC to concentrate the magnetic beads, and the supernatant 
40 was removed as before. This washing procedure was repeated twice more. At no time were the beads 

allowed to dry during the above procedure. 
30 To the DpnII digest from above (200 ^L) was added 200 jiL of 2X B&W buffer. This mixture 

was transferred to a tube containing the prepared magnetic streptavidin beads, and the beads were 
resuspended by flicking the tube. The tube was incubated in a Thermomixcr at 25**C for 30 minutes so 
that tlie beads were fully suspended at all times to help ensure biotin capture. Next, the tube was placed 
in the MPC, and the supernatant was withdrawn and stored at -20**C. The beads were washed five 
35 limes with 400 |iL X B&W buffer, and the supematants were analyzed to verif>' that little or no 
radioactive counts were present after the third or fourth wash. 
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To remove the tagged cDNA fragments, the following steps were performed:: 

1 . Resuspend beads in a mixture of 20 |iL lOXNEB 3 Buffer (New England Biolabs) and 168 ^L 
water. 

2. Add 2 \ih bovine serum albumin lOOX stock (10 mg/mL BSA ,New England Biolabs). 
5 3 . Add 1 0 BsmBI (4 U/jaL). 

4. Shake tube in Thermomixer at SS'^C for 2 hours, so that beads remain fully suspended. 

5. Add 7 jiL BsmBI lOX stock solution, and shake tube at 55°C for 1-2 additional hours. 

15 6. Place tube in MPC for 60 seconds. Collect supernatant, which contains released tag sequences. 

7. Add 2 yiL PELLET PAINT. Extract once with 200 phenol/chloroform and once with 200 
10 ^L chlorofonn. 

To the resultant aqueous layer was added 20 \ih 3 M NaO Ac and 500 \it, EtOH. This mixture was 
stored at -20''C overnight or at -70*'C for 30 minutes. The solution was then spun in a microfuge for at 
least 30 minutes. The supernatant was removed and discarded, and the pellet was washed with, and 
centrifuged in, 0.5 mL of 70% EtOH. After the supernatant was removed, the cDNA pellet was 
1 5 resuspended in 1 0 water. The released cDNA fragment mixture had the following formula (SEQ ID 
NO: 19): 

5' -GCATTGAGACGATTCTTTTTTTTTTTTTTTTTTVXXX ... X-3' 

3' -ACTCTGCTAAGAAAAAAAAAAAAAAAAAABXXX ... XCTAG-3' 



20 



25 



30 



20 



The pBP9 tag-vector from above was digested with Bbsl and BamHI, treated with calf intestfaial 
alkaline phosphatase, and purified by electrophoresis in an agarose gel. The larger fragment 
(approximately 3.15 kb) was purified from the gel and combined with the cDNA fragment mixture for 
35 ligation. Note that the vector has been engineered so that Bbsl digestion produces an end compatible 

25 with the BsmBI-digested end of the cDNAs. The purified vector fragment was ligatcd to the cDNA 
mixture prepared as above. 

To produce a vector library of tagged cDNA inserts, the ligatcd vector- insert mixture was used to 
transform electrocompetcnt E. coli TOP 10 cells (Invitrogen). A small quantity of the transformation 
was plated on LB agar plates containing 30 ng/mL chloramphenicol to determine titer. Aliquots 
30 comprising approximately 1.6 x 10* clones were used to inoculate liquid cultures in order to expand the 
library. Plasmid DNA was extracted from these cultures and stored. 

45 

E. Hybridization of Tagged Fragments to Tmmobilized Tag Complements 
The tagged DNA inserts can be captured or sorted by hybridization to an array of Immobilized tag 
35 complements such as described above. 

Preferably, the tag cDNA conjugates are amplified from the vectors by PGR, using a conventional 
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protocol such as the following. For each of 8 replicate PCRs. the following reaction components are 
combined: 1 jiL vector DNA (125 ng/^L for a library, lO' copies for a single clone); 10 pL lOX 
KLENTAQ BuflFer (Clontech Labs, Palo Alto, CA); 0.25 biotinylated 20-mer "forward" PCR 
primer (1 nmot/pL); 0.25 pL F AM- labeled 20-mer "reverse" PCR primer (1 nmol/fiL); 1 pL 25 mM 
dATP, dGTP, dTTP, and 5-methyI-dCTP (total dNTP concentration 1 00 mM); 5 \iL DMSO; 2 \iL SOX 
KLENTAQ enzyme: and 80.5 ^iL water, for a total volume of 100 \iL, The PCR reactions are run in an 
MJR DNA Engine (MJ Research), or like thennocycler, with the following proioco!: (1) 94°C for 4 
min; (2) 94X for 30 sec; (3) 6rC for 3 min; (4) 8 cycles of steps 2 and 3; (5) 94X for 30 sec; (6) 64''C 
for 3 min; (7) 22 cycles of steps 5 and 6; (8) 67*'C for 3 min; and (9) hold at 4°C. 

The 8 PCR mixtures are pooled and 700 pL phenol is added at room temperature, after which the 
combined mixture is vortexed for 20-30 sec and then centri&ged at high speed for 3 min (e.g., 14,000 
rpm in an Eppendorf bench top centrifuge, or like instrument). The supernatant is removed and 
combined with 700 fiL chloroform (24: 1 mixture of chloroform :isoamyl alcohol) in a new tube, 
vortexed for 20-30 sec, and centrifuged for 1 min, after which the supernatant is transferred to a new 
tube and combined with 80 3 M sodium acetate and 580 |iL isopropanol. After centrifuging for 20 
min, the supernatant is removed and 1 mL lOVo ethanol is added. The mixture is centrifuged for 5-10 
min, after which the ethanol is removed, and the precipitated DNA is dried in a microcentriftige. 

After resuspension, the cDNA is purified on avidinated magnetic beads (Dynal) using the 
manufacturer's recommended protocol. The cDNA is then digested with Pad ( I unit of enzyme per 
DNA), also using the manufacturer's recommended protocol (New England Biolabs, Beverly, MA). 
The cleaved DNA is extracted with phenol/chloroform followed by ethanol precipitation. The tags of 
the tag-cDNA conjugates are rendered single-stranded by adding 2 units of T4 DNA polymerase (New 
England Biolabs) per ^g of streptavidin-purifted DNA. 150 of streptavidin-purifted DNA is 
resuspended in 200 |iL water and combined with the following reaction components: 30 jiL I OX NEB 
Buffer No. 2 (New England Biolabs); 9 ^L 1 00 mM dGTP; 30 (iL T4 DNA polymerase ( 1 0 U/^L); and 
31 jaL water, to give a final reaction volume of 300 ^L. After incubation for 1 hour at 37°C, the 
reaction is stopped by adding 20 ^L 0.5 M EDTA, and the T4 DNA polymerase is inactivated by 
incubating the reaction mixture for 20 rain at 75 ''C. The tag-cDNA conjugates are purified by 
phenol/chloroform extraction and ethanol precipitation. 

For hybridization of tag-cDNA conjugates to immobilized tag complements, the tag-cDNA 
conjugates prepared as above are suspended in 50 )iL water, and the resulting mixture is combined with 
40 ^L 2.5X hybridization buffer. The combined mixture is filtered through a SPIN-X spin column (0^2 
(im) using a conventional protocol to give a filtrate containing the tag-cDNA conjugates (5 mL of 2.5X 
hybridizaiion buffer consists of 1 .25 mL 0.1 M sodium phosphate, pH 7.2, 1 .25 mL 5 M NaCl, 025 
mL, 0.5% TWEEN 20, 1.5 mL 25% dcxtran sulfate, and 0.75 mL water). The number of beads in a 
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given volume is readiiy estimated with a hemocytometer. Approximately 1.8 x 10^ beads derivatizcd 
with tag complements as above are dissolved in 1 0 TEATWEEN buffer (TE with 0.1% TWEEN 20) 
and centrifiiged sothat the beads form a pellet, from which the TE/TWEEN is removed. To the beads, 
25 fiL of IX hybridization buffer (10 mM sodium phosphate, pH 72, 500 mM NaCI, 0.01% TWEEN 
5 20, and 3% dextran sulfate) is added and the mixture is vortexed to fully resuspend the beads, af^er 
which the mbcture is centrifuged so that the beads fomi a pellet, and the supernatant is removed. 
The tag-DNA conjugates from the above filtrate are incubated at 75**C for 3 min, and then are 
15 combined with the beads. The mixture is vortexed to fully resuspend the beads. The resuhing mixture 

is further incubated at IS^C with vortexing for approximately three days (60 hours). After 
1 0 h>'bndizatioo, the mixture is centrifuged for 2 min and the supernatant is removed. The beads are 

washed twice with 500 TE/IWEEN 20, and are resuspended in 500 IX NEB buffer No. 2 with 
0. 1 % TWEEN 20. The beads are incubated at 64*'C in this solution for 30 min. after which the mixture 
is centrifuged so that the beads form a pellet. The supernatant is removed, and the beads arc 
resuspended in 500 TE/TWEEN. The hybridized conjugates may then be analyzed by any method 
1 5 known in the ait or discussed herein. 

25 

Example 3 
Succinate-Based Linking Moieties 
The following procedure was used to make GMA beads derivatized with a cleavable succinate 

30 

20 ester linking moiety (10%) and a non-cleavable succinate amide linking moiety (90%). 

Succinate Ester . Succinic anhydride (240 mg, 2.4 mmol) was added in portions over 30 min to a 
.stirred solution of 5*-0-dimethoxytrityl thymidine (3 mol) in anhydrous pyridine (6 mL) containing 4- 
dimethylaminopyridine (1 80 mg, 3 .5 mmol). The reaction was stirred overnight and monitored by thin 
layer chromatography using chloroform:methanol 9:1 (v:v). The mixture was dried to a gura under 
25 reduced pressure. Residual pyridine was removed by co-evaporation with dry toluene (3 x 20 niL). The 
product was dissolved in dichloromethane (20 mL) and washed with ice-cold 10% aqueous citric acid 
(2x15 mL) and tiien witii water (2x15 niL). The organic phase was dried over anhydrous sodium 
sulfate and evaporated under reduced pressure. The resultant foam was dissolved in dichloromethane 
(1 0 mL) and precipitated at room temperature into rapidly stirred hcxanc (250 mL). After the mixture 
30 was centrifuged, the supernatant was decanted. The precipitate was dried under reduced pressure to 
^5 give a yield of 75-80% (ester product). 

Succinamide . The above procedure for succinylation was repeated with an identical amount of 5'- 
0-diraethoxytrityl-3*-deoxy-3 '-amino thymidine instead of 5'-0-dimethoxytrityl thymidine, to forni the 
corresponding succinamide product (amide product). 
50 35 Attachment to Support . 0.821 g (1.1 mmol) of the ester product or amide product was dissolved 

and stirred for 2 min in a mixture of 5 mL 0.2 M HOBt (l-hydroxybenzotriazolc hydrate, 0.27 g in a 
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mixture of 4.7 mL DMSO (dimethyl sulfoxide) and 4.7 mL NMP (N-methyJ pyrrolidone)) and 5 mL of 
0.2 M HBTU (0-benzotriazoie-l-yl-N,N^',N'-tetramethyJuronium hexafluorophosphate, 1.03 g in a 
mixture of 1.4 mL N,N-diisopropylethylamine, 9.3 mL DMSO and 9.3 mL NMP). AminoKlerivatized 
GMA beads (5.5g, 5.6 diameter) was stirred in 1 0 mL DMF. To this sluny was added all of the 
amide product from above and an amount of ester product equal to 1/9 of the amide product (9:1 molar 
ratio). The slurry was placed in a screw cap falcon tube and gently shaken for 2 hours. The mixture 
was then filtered and washed successively with 20 mL each of DM7, acetonitrile, methanol, and ether. 
The dried beads were then ready for synthesis of tag complements. 

Although the invention has been described with respect to particular tsmbodimenls, it will be 
appreciated that various modifications can be made without departing from the spirit of the invention. 
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1 . A method of synthesizing a repertoire of oligonucleotide tags, the method comprising: 
synthesizing a plurality of different sequence oligonucleotide populations on one or more solid phase 

supports, such that each population is located at a spatially discrete region relative to the other populations, 
and the oligonucleotides in each population comprise a tag-complement sequence that is the same for every 
oligonucleotide in a given population, 

cleaving a fraction of the oligonucleotides from each population on the one or more supports, to 
release a mixture of different-sequence tag-complement oligonucleotides, and 

annealing a primer to each tag-complement oligonucleotide and extending each annealed primer to 
form a duplex comprising a lag-complement strand and a complementary tag strand, such that the tag- 
complement strands or the duplexes thus formed comprise an oligonucleotide tag repertoire. 

2. The mettiod of claim I , wherein said oligonucleotides are bound to tlie one or more supports by 
their S'-termini, and each oligonucleotide contains a universal primer-binding sequence on the 3'-side of the 
tag complement sequence. 

3. The method of claim 1 , wherein said fraction is from 10 to 30%. 

4. The method of claim 1, whercm a selected fraction of each population of synthesized 
oligonucleotides is bound to the one or more supports via base-cleavable linkages, and said cleaving 
includes subjecting the immobilized oligonucleotides to basic conditions effective to cleave said fraction of 
oligonucleotides from the one or more supports. 

5. The method of claim 1 , wherein the tag complement sequence in each said oligonucleotide is 
flanked by first and second endonuclease restriction sites which may be the same or different. 

6. The method of claim 1, wiierein said repertoire of oligonucleotide tag complements contains at 
least 1000 different-sequence tag complements. 

7. The method of claim 1 , wherein said repertoire of oligonucleotide tag complements contains at 
least 10,000 different-sequence tag complements. 

8. The method of claim 1, wherein said tag complement sequences are from 12 to 60 nucleotides in 
length. 

9. The method of claim 1 , wherein said tag complement sequences are from 1 8 to 40 nucleotides in 
length. 
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1 0. The method of claim 1 . wherein said one or more solid phase supports are microparticJes. 

1 1 . The method of claim 1 0, wherein said microparticIe(s) have diameters of from about 5 to about 40 

^m. 

12. The method of claim 1 , wherein said tag complement sequences each contain a plurality of 
segments that are (i) three to nine nucleotides in length and (ii) selected from a minimally cross-hybridizing 
set of tag complement sequences. 

13. A repertoire of tag oligonucleotides prepared by the method of claim 1 . 

14. A method of forming a tag-vector hTjrary comprising multiple members from a tag repertoire, the 
method comprising the steps of: 

synthesizing a plurality of oligonucleotide populations on one or more solid phase supports, such that 
each population is located at a spatially discrete region relative to the other populations, and the 
oligonucleotides in each population comprise a tag-complement sequence that is the same for every 
oligonucleotide in a given population, 

cleaving a fraction of the oligonucleotides from each population on the one or more supports, to 
release a mbcture of different-sequence tag-complemoit oligonucleotides, 

annealing a primer to each tag-complenient oligonucleotide and extending each annealed primer to 
fomi a duplex comprising a tag-complement strand and a complementaiy tag strand, such that the tag- 
complement strands or the duplexes thus formed comprise an oligonucleotide tag repertoire, and 

inserting said duplexes by ligation into multiple copies of a selected vector, to form a tag-vector 
library comprising members of said tag repertoire. 

1 5. A tag-vector library formed by the method of claim 14. 

1 6. A method of forming a library of tagged polynucleotide fragments, the method comprising the 
steps of: 

synthesizing a plurality of oligonucleotide populations on one or more solid phase supports, such that 
each population is located at a spatially discrete region relative to the other populations, and the 
oligonucleotides in each population comprise a tag-complement sequence that is the same for every 
oligonucleotide in a given population, 

cleaving a fraction of the oligonucleotides from each population on the one or more supports, to 
release a mixture of different-sequence tag-complement oligonucleotides, 

annealing a primer to each tag-complement oligonucleotide and extending each annealed primer to 
form a duplex comprising a tag-complement strand and a complementary tag strand, such that the tag- 
complement strands or the duplexes thus formed comprise an oligonucleotide tag repertoire. 
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inserting said duplexes by ligation into multiple copies of a selected vector, to form a tag- vector 
library comprising members of said tag repertoire, and 

combining said tag-vector library with a mixture of different polynucleotide fragments under 
ligation conditions eflfective to insert a polynucleotide fragment into each tag-vector at a site adjacent to a 
tag-sequence in each vector, thereby forming a library of tagged polynucleotide fragments. 

17. A library of tagged poljuucleotide fragments formed by the method of claim 16. 

18. A system for sorting polynucleotides, said system comprising: 

one or more solid phase supports on which are attached a plurality of oligonucleotide populations, 
such that each population is located at a spatially discrete region on said one or more supports relative to 
the other popuIalion.<;, and the oligonucleotides in each population comprise a tag-complement sequence 
that is the same for every oligonucleotide in a given population, and 

a tag composition selected from the group consisting of a tag repertoire prepared by the method of 
claim 1, a tag- vector library prepared by the method of damn 14, or a Iferaiy of tagged polynucleotide 
fragments prepared by the method of claim 1 6, which tag composition is prepared from said one or more 
supports recited in the preceding paragraph. 

19. A solid phase bead comprising a population of oligonucleotides attached to the bead, said 
population comprising first and second classes of oligonucleotides which contain identical oligonucleotide 
sequences, wherein the oligonucleotides in said first class contain a cleavable linking moiety that permits 
selective cleavage of the first class of oligonucleotides from the support, without significantly cleaving the 
second class of oligonucleotides. 

20. A mixture of solid phase beads, said mixture comprising 

a plurality of beads, each having a population of oligonucleotides attached thereto, wherein each 
population comprises first and second classes of oligonucleotides such that the oligonucleotides in the first 
class contain a cleavable Hnking moi^ that permits selective cleavage of the first class of oligonucleotides 
from the support, without significantly cleaving the second class of oligonucleotides, and the 
oligonucleotides in both classes in each population comprise a tag-complement sequence that is the same 
for every oligonucleotide in a given population. 
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